ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Speaker Change Detection with Weighted-sum Knowledge Distillation based on Self-supervised Pre-trained Models

Hang Su, Yuxiang Kong, Lichun Fan, Peng Gao, Yujun Wang, Zhiyong Wu

Speaker Change Detection (SCD) is an essential problem in speech processing and has various applications in many fields. The self-supervised models have shown impressive performance on many downstream tasks in the pre-training and fine-tuning paradigm. However, it has limitations to apply a fine-tuned self-supervised pre-trained model to frame-level SCD task in real industry because it typically requires a smaller model that consumes fewer computational resources. To tackle this issue, we propose using Knowledge Distillation (KD) to leverage the capabilities of the self-supervised model. First, a basic KD method based on the pre-trained model is proposed. Then, a weighted-sum KD method is proposed to selectively extract information from the pre-trained model. Experimental results demonstrate the effectiveness of the basic KD method as well as a further improvement for the weighted-sum KD method. The proposed method is more suitable for industrial applications compared with fine-tuning.