ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Adapter pre-training for improved speech recognition in unseen domains using low resource adapter tuning of self-supervised models

Sathvik Udupa, Jesuraj Bandekar, Saurabh Kumar, Deekshitha G, Sandhya B, Abhayjeet S, Savitha Murthy, Priyanka Pai, Srinivasa Raghavan, Raoul Nanavati, Prasanta Kumar Ghosh

Adapter tuning is an approach to fine-tune large neural network models on new tasks. These methods can be used to efficiently fine-tune large self-supervised learning (SSL) models for speech recognition tasks. In this work, we aim to perform improved low-resource adaptation of SSL features from source to target domain. Toward this, we experiment with adapter pre-training for Wav2Vec2-based models over different source and target configurations. We experiment over 3 datasets consisting 14 languages, including very low-resource languages. Further, we show the consistency of this method across different adapter dimensions and analyse the feature transformation due to the adapter pre-training process. With the proposed methods, we obtain over 10%-30% relative improvement in WER and CER with Viterbi decoding in 13 languages. Further, we obtain consistent performance gains using LM decoding on many of these languages.