ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

End-to-End Neural Speaker Diarization with Absolute Speaker Loss

Chao Wang, Jie Li, Xiang Fang, Jian Kang, Yongxiang Li

End-to-end neural speaker diarization (EEND) has proved to be a very promising method in speaker diarization, especially in tackling overlapping speech recordings. In this paper, we propose a new approach to EEND that incorporates an absolute speaker loss function, which can force the network to consider global speaker identity information in the training phase, and keeps one-stage inference at the same time. Besides, we modify the pre-processing module and do not need feature splice, which results in longer contextual information and supports longer recording input when inferencing. As a result, with our proposed one-stage system, we achieve better results in simulated librispeech conversation-like data sets compared to EEND-VC, a two-stage system. We evaluate our experiments in different chunkings, different durations and different overlap ratios, and achieve up to 70% relative improvement in terms of DER over baseline EEND-VC on short recordings and up to 7.5% on long recordings.