This paper introduces the sub-3sec problem in speaker verification, a short-duration task rarely explored. The issue arises from labor-intensive annotations and costly recordings for text-dependent speaker verification (TD-SV) corpora. To address this issue, we propose an automatic pipeline to extract short phrases from text-independent speaker verification (TI-SV) corpora. An ASR model identifies phrases and timestamps, with N-gram analysis ensuring phrases are common across speakers, enabling sufficient trials. Using this pipeline, we created Sub3Vox, a TD-SV corpus from VoxCeleb1, containing 3.7 million short utterances from 1,250 speakers—far larger than existing TD-SV corpora. Results show that matching enrollment and test phrases in TD-SV reduces EER by up to 45.23%. Additionally, shortening test utterances causes significant TI-SV performance drops but only minor reduction for TD-SV, offering the first analysis of phrase length effects on sub-3-second performance.