ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

MIM-DG: Mutual information minimization-based domain generalization for speaker verification

Woohyun Kang, Md Jahangir Alam, Abderrahim Fathan

In the field of speaker verification, the current trend is to train a neural network-based speaker discriminative system and use the hidden representation as a speaker embedding vector. This framework have showed impressive performance in various speaker verification tasks, their performance is limited when it comes to mismatched conditions due to the variability within them unrelated to the speaker identity. In order to overcome this problem, we propose a novel training strategy that regularizes the embedding network to have minimum information about the nuisance attributes. More specifically, our proposed method aims to minimize the mutual information between the speaker embedding and the nuisance labels during the training process, where the mutual information is estimated using the statistics obtained via an auxiliary normalizing flow model. The proposed method is evaluated on cross-lingual and multi-genre speaker verification datasets, and the results show that the proposed strategy can effectively minimize the within-speaker variability on the embedding space.