ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Light-Weight Speaker Verification with Global Context Information

MISEUL KIM, ZHENYU PIAO, Seyun Um, Ran Lee, Jaemin Joh, Seungshin Lee, Hong-Goo Kang

In this paper, we propose a light-weight speaker verification (SV) system that utilizes the characteristics of utterance-level global features. Many recent SV tasks employ convolutional neural networks (CNNs) to extract representative speaker features from the given input utterances. However, their inherent receptive field size on the feature extraction process is limited by the localized structure of the convolutional layers. To effectively extract utterance-level global speaker representations, we introduce a novel architecture combining a CNN with a self-attention network that is able to utilize the relationship between local and aggregated global features. The global features are continuously updated at every analysis block using a point-wise attentive summation to the local features. We also adopt a densely connected CNN structure (DenseNet) to reliably estimate speaker-related local features with a small number of model parameters. Our proposed model shows higher speaker verification performance with EER 1.935% with significantly small number of parameters, 1.2M, which is 16% reduced model size than the baseline models.