ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Variable Segment Length and Domain-Adapted Feature Optimization for Speaker Diarization

Chenyuan Zhang, Linkai Luo, Hong Peng, Wei Wen

In speaker diarization, a suitable segment length is still a challenge. Long segments may contain multiple speakers, leading to unreliable embeddings, while short segments may lack sufficient information. We propose an approach of variable segment length using a mixed segment recognition (MSR) network to address this. The MSR module distinguishes between segments with multiple speakers and those with a single speaker. Identified mixed segments are re-cut until pure or reaching the minimum length. In addition, we propose a scheme of domain-adapted feature optimization to fine-tune the pre-trained speaker embedding extractor, where both a specific data augmentation and a distance loss function are used to improve embeddings of the remaining segments still with speaker alternation and overlap. The results demonstrate the effectiveness of our method. It achieves a relative improvement of 25.5% in diarization error rate over the baseline and surpasses the recent state-of-the-art methods.