ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Domain Adaptive Self-supervised Training of Automatic Speech Recognition

Cong-Thanh Do, Rama Doddipatla, Mohan Li, Thomas Hain

This paper explores domain adaptive self-supervised training of automatic speech recognition (ASR). Unlabeled data from the target domain can either be used in training the self-supervised pre-trained model or in the fine-tuning stage using semi-supervised approaches for the ASR task or both. Here we specifically focus on how semi-supervised approaches can enhance domain adaptation of pre-trained models built using self-supervised learning (SSL). For the purpose of this study, we use variants of English accents as the data from different domains. ASR experiments targeting single domain achieve relative word error rate (WER) reduction in the range 2.7-41.8% based on the extent of domain mismatch, while in the multiple-domain setting we achieve a relative WER reduction of 8% on average using semi-supervised fine-tuning on top of the model pre-trained with target domain using SSL.