ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Real-time scheme for rapid extraction of speaker embeddings in challenging recording conditions

Kai Liu, Ziqing Du, Zhou Huan, Xucheng Wan, Naijun Zheng

Speaker embedding plays a crucial role in the realm of speaker-related tasks like speaker verification, diarization, target speaker extraction and voice conversion. Variant networks have been suggested for generating speaker embedding from the enrollment speech of the target speaker. Nevertheless, in real-time speaker-related tasks, it is typical that a pristine recording environment is unavailable. Consequently, enrollment speech may be tainted by background noises and interference from non-target speakers, potentially compromising task performance. In this study, we present a three-stage progressive filtering scheme for rapidly extracting speaker embeddings in noisy recording scenarios, and validate its effectiveness and efficiency through an real-time target speaker extraction task conducted on real meeting data.