ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Improving Recognition of Out-of-vocabulary Words in E2E Code-switching ASR by Fusing Speech Generation Methods

Lingxuan Ye, Gaofeng Cheng, Runyan Yang, Zehui Yang, Sanli Tian, Pengyuan Zhang, Yonghong Yan

Out-of-vocabulary (OOV) is a common problem for end-to-end (E2E) ASR. For code-switching (CS), the OOV problem on the embedded language is further aggravated and becomes a primary obstacle in deploying E2E code-switching speech recognition (CSSR) systems. Existing recipes for monolingual scenarios typically take advantage of text-to-speech (TTS) synthesis or utilize fine-grained modeling units. However, the sparsity of CS greatly decreases the probability of words to be covered (mainly the embedded language), which hinders the collecting of corresponding CS text for TTS. Using fine-grained units brings limited improvement to the OOV words while increasing the risk of misspelling. In this paper, we propose two distinct CS speech generation methods to improve the recognition of CSSR systems on OOV words. First, we utilize monolingual corpora to generate spliced CS speech containing OOV words. Second, we propose an algorithm to generate CS text containing OOV words, thus enabling using TTS to synthesize CS speech. Both methods are carefully designed to ensure acoustic and semantic smoothness of generated speech. In addition, we provide restrictive methods to suppress the side-effects of using artificially generated data and help avoid misspelling. Finally, we reduced WER on OOV words by 56.3% absolutely on the test set.