ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Speaker Personalization for Automatic Speech Recognition using Weight-Decomposed Low-Rank Adaptation

George Joseph, Arun Baby

Personalizing automated speech recognition (ASR) for voice assistant systems is often considered the holy grail, requiring meticulous attention to detail in model optimization. When dealing with limited speaker data, the selection of hyper-parameters becomes paramount in fine-tuning large ASR models. One effective method for this optimization is low-rank adaptation (LoRA), which proves instrumental in enhancing the performance of large language models (LLMs). A variation of LoRA, Weight-Decomposed Low-Rank Adaptation (DoRA) also promises enhanced performance. In our study, we employed LoRA and DoRA, to refine the state-of-the-art cascaded conformer transducer model for speaker personalization. This involved adding a small number of speaker-specific weights to the existing model and fine-tuning them accordingly. Experimental assessments show an average relative improvement of 20% in word error rate across speakers with limited data, showcasing its efficacy in addressing the challenge of personalizing ASR systems in real-world applications.