ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Target Vocabulary Recognition Based on Multi-Task Learning with Decomposed Teacher Sequences

Aoi Ito, Tatsuya Komatsu, Yusuke Fujita, Yusuke Kida

This paper proposes a method for target vocabulary recognition based on multi-task learning with decomposed teacher sequences. The proposed method first decomposes teacher sequences into the target vocabulary and the non-target vocabulary sequences. Then, multi-task learning is performed by calculating losses for both the target vocabulary sequence and the non-target vocabulary sequence. By utilizing information from both target and non-target vocabulary, our proposed method provides more stable training and more accurate recognition of target vocabulary than single-task learning using only the target vocabulary. Experiments conducted on the Corpus of Spontaneous Japanese (CSJ) dataset, using numerals and katakana as target vocabulary, demonstrate the effectiveness of our proposed method. The results show a maximum CER improvement rate of 27% for katakana and 34% for numerals in target vocabulary recognition, as well as an 84% reduction in insertion errors in non-target vocabulary utterances.