ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Conformer-based Language Embedding with Self-Knowledge Distillation for Spoken Language Identification

Feng Wang, Lingyan Huang, Tao Li, Qingyang Hong, Lin Li

The utilization of Conformer-based architecture has been shown to be effective in improving the performance of spoken language identification (LID) in recent years due to Conformer's superior representational capacity. However, when performing language identification on short speech segments, a significant drop in performance is often observed. In this paper, we propose a novel method to alleviate this issue by introducing a self-knowledge distillation technique to Conformer-based LID architecture. We distill the predictive distribution between the original input and the input processed by a double-ended random masking module during the training stage for each sample. Experimental results demonstrate the effectiveness of the proposed method on two datasets: OLR21 with 16,000 Hz sampling rate and LRE22 with 8,000 Hz sampling rate. Moreover, the proposed method also enhances the performance of language identification on short-duration speech segments.