ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Enroll-Aware Attentive Statistics Pooling for Target Speaker Verification

Leying Zhang, Zhengyang Chen, Yanmin Qian

The well-developed robust speaker verification system can remove the environment noise and retain speaker information automatically. However, when the uttering voice is disturbed by another interfering speaker's voice, the speaker verification system usually cannot selectively extract only the target speaker's information. Some works have been done by introducing a speech separation network to separate the target speaker's speech in advance. However, adding a speech separation network for speaker verification task could be redundant. Here, we proposed enroll-aware attentive statistic pooling (EA-ASP) layer to help the speaker verification system extract specific speaker's information. To evaluate the system, we simulate the multi-speaker evaluation data based on Voxceleb1 data. The results show that our proposed EA-ASP can outperform the baseline system by a large margin and achieved 50% relative Equal Error Rate (EER) reduction.