ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

A Novel Phoneme-based Modeling for Text-independent Speaker Identification

Xin Wang, Chuan Xie, Qiang Wu, Huayi Zhan, Ying Wu

Text-independent speaker identification attracted growing attention while it remains challenging to extract speaker-specific features from a speech with arbitrary content. End-to-end systems trained with utterance-level features suffer from performance degradation caused by speech content variation. To address this issue, this paper proposes a novel phoneme-based approach with the following key features: first, it restricts the variety of speech content by splitting each utterance into a set of phoneme segments and develops the phoneme-constrained models to extract segment-level embeddings of speakers; second, it leverages a soft-voting mechanism with mono-phonemic thresholds and weights to combine the results of different phonemes. Experimental results on AISHELL and ASRU2019 datasets show that the proposed approach is effective and robust, which outperforms the state-of-the-art methods in both EER and accuracy, especially with a larger phonemic mismatch between the enrollment and test utterances. In addition, the proposed system is efficient that can be trained well on a small-scale dataset.