ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Improving Multilingual ASR Robustness to Errors in Language Input

Brady Houston, Omid Sadjadi, Zejiang Hou, Srikanth Vishnubhotla, Kyu J. Han

Explicitly adding language information to multilingual ASR models during training has been shown to improve their performance. However, this also requires using language information during inference. In cascaded systems, this language label may come from external language identification models, which are susceptible to errors. In this work, we characterize the sensitivity to errors in language inputs of several common language-incorporation strategies used in multilingual ASR. We show that some of these strategies are highly sensitive to the correctness of language information being used during inference, and also demonstrate that introducing a small amount of language label noise during training can greatly improve the model’s robustness to incorrect language information. As multilingual ASR continues to become more common, this work demonstrates the importance of understanding the sensitivity of these models to language inputs and ensuring models are robust to errors.