ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Dynamic Fully-Connected Layer for Large-Scale Speaker Verification

Zhida Song, Liang He, Baowei Zhao, Minqiang Xu, Yu Zheng

Recently, the mainstream x-vector for speaker verification usually adopts a one-hot encoded fully-connected (FC) layer for classification at the training stage. Suppose a large-scale dataset (e.g., one million speakers) is prepared to optimize the network. The unbearable computation cost and memory requirement are mainly from the FC layer. We propose a dynamic fully-connected (Dynamic FC) layer for speaker verification to achieve a tradeoff between hardware resources and system performance. The proposed Dynamic FC uses a dynamic class queue (DCQ) to store a subset of speaker identity centers and uses an identity-based data loading mechanism to realize memory and time savings. The virtue of the proposed method is that the required memory only depends on the size of the DCQ and does not increase with the number of speakers in the training dataset. The proposed method on the VoxCeleb dataset achieves an EER of 2.345% and a minDCF of 0.261 at a low memory and computation cost.