ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

SCD-Conformer: Semantic Content Disentanglement for Text-Independent Speaker Verification

Shanshan Yao, Dianlong Liu, Tian Li

Text-independent speaker verification (TISV) identifies a specific speaker without relying on any particular semantic content. In order to eliminate the influence of semantic content in utterances on speaker feature, we propose a SCD-Conformer for semantic content disentanglement. Firstly, a dual-branch Conformer is used to extract the speaker feature and semantic content feature respectively, in which the content feature is directly extracted by a pre-trained model without increasing training parameters and computational complexity. Then, both the frame-level and utterance-level disentanglement methods are used for disentangling the speaker feature and content feature, including dimension matching module, aggregation module and similarity module. Experimental results show that disentangling at utterance level is more effective than that at frame level, whereas the combination of the two is the best, which averagely improved the performance by 11% compared to the best baseline.