ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Bridging Child-Centered Speech Language Identification and Language Diarization via Phonetics

Yujia Wang, Hexin Liu, Leibny Paola Garcia

Language Diarization (LD) can be viewed as an expansion of Language Identification (LID) that removes the monolingual input assumption. Taking inspiration from this connection and the challenges inherent in Code-Switching (CS) child-centered speech, we extended PHO-LID, an LID model that incorporates acoustic and phonotactic information without needing phoneme annotation, to LD. Our method explores three avenues to adapt PHO-LID into LD: a temporal slicing scheme bridging LID and LD, an embedding modification enriching LD message, and a back-end scoring facilitating fine-tuning. Compared to the baseline, trained on a simulated out-of-domain dataset, SEAME_sim, our method shows a 15.82% relative accuracy improvement on MERLIon, a child-centered CS speech corpus. The back-end scoring preserves pre-trained knowledge in fine-tuning, with a 16.93% relative accuracy improvement on pre-trained SEAME_sim test set without compromising the fine-tuning test set performance.