ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Text-only Domain Adaptation for CTC-based Speech Recognition through Substitution of Implicit Linguistic Information in the Search Space

Tatsunari Takagi, Yukoh Wakabayashi, Atsunori Ogawa, Norihide Kitaoka

Domain adaptation using only language models in Automatic Speech Recognition (ASR) has been widely studied because of its practicality. Still, it remains challenging for non-autoregressive ASR models such as Connectionist Temporal Classification (CTC)-based ones. Against this background, this study addresses a text-only domain adaptation method for CTC-based ASR models by leveraging the Density Ratio Approach (DRA). Our method combines a beam search algorithm for substituting linguistic information in DRA, accommodated to the CTC decoding procedure, and a language model adaptation method considered the conditional independence assumption of CTC. We conducted domain adaptation experiments for character-level ASR with the Corpus of Spontaneous Japanese (CSJ) and sub-word ASR with the English-language LibriSpeech and GigaSpeech corpora. The experimental results confirmed that our proposed method achieved improved accuracy in Japanese and English compared to the Shallow Fusion method.