ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Enhancing Neural Transducer for Multilingual ASR with Synchronized Language Diarization

Amir Hussein, Desh Raj, Matthew Wiesner, Daniel Povey, Paola Garcia, Sanjeev Khudanpur

In multilingual environments, seamless language switching, including code-switching (CS) within utterances, is essential for real-time applications. Conventional Automatic Speech Recognition (ASR) combined with language diarization requires post-processing to synchronize language labels with recognized words accurately, presenting a considerable challenge. In this study, we introduce a multitask learning framework that synchronizes Language Identification (LID) with ASR, utilizing a neural transducer architecture. This auxiliary task integrates both acoustic and lexical features to perform LID. Furthermore, we use resulting language representation as an auxiliary input to improve ASR. We demonstrate the efficacy of our proposed approach on conversational multilingual (Arabic, Spanish, Mandarin) and CS (Spanish-English, Mandarin-English) test sets.