ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Text-Only Domain Adaptation Based on Intermediate CTC

Hiroaki Sato, Tomoyasu Komori, Takeshi Mishima, Yoshihiko Kawai, Takahiro Mochizuki, Shoei Sato, Tetsuji Ogawa

We propose a domain adaptation method that enables connectionist temporal classification (CTC)-based end-to-end (E2E) automatic speech recognition (ASR) models to adapt to a target domain using unpaired text data. The performance of ASR models deteriorates for words and topics not present in the training data, such as the latest news. Although it is difficult to collect paired speech and text data for such subjects, unpaired text data is relatively easy to obtain. Therefore, a domain adaptation method using unpaired text data is proposed for the E2E ASR model based on the intermediate CTC. This model introduces an adaptation branch to embed acoustic and linguistic information in the same latent space, allowing for domain adaptation using unpaired text data of the target domain. Experimental comparisons for multiple out-of-domain settings demonstrate that the proposed text-only domain adaptation achieves a comparable or better performance than the existing shallow-fusion-based domain adaptation, and further performance improvement is achieved by integration with shallow fusion.