ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Adapting Whisper for low-resource Hindi-English Code-Mix speech with on-the-fly Augmentation & LLM-Synthesised Data

Astik Biswas, Oleg Shevelev, Amine Abdaoui, Vivek Tyagi, Abdelmoumene Boumadane

Code-switching (CS) automatic speech recognition (ASR) faces challenges due to language confusion from accents, acoustic similarities, and seamless transitions. In multilingual India, CS is prevalent, yet adapting pre-trained Whisper for low-resource Indian CS-ASR remains under-explored. This study explores strategies for adapting Whisper with limited data. First, we propose language prompts for fine-tuning and on-the-fly code-mixed data simulation to handle language switches. Second, we use Llama for few-shot code-switching (CS) text generation, coupled with audio synthesis, to create synthetic data for fine-tuning the Whisper model. Experiments on a Hindi-English CS dataset show promising results, demonstrating the techniques effectiveness. These findings are applicable to other multilingual contexts, aiding Whisper’s adaptation to new domains.