ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Improving Data Driven Inverse Text Normalization using Data Augmentation and Machine Translation

Debjyoti Paul, Yutong Pang, Szu-Jui Chen, Xuedong Zhang

Inverse text normalization (ITN) is used to convert the spoken form output of an automatic speech recognition (ASR) system to a written form. Traditional handcrafted ITN rules can be complex to transcribe and maintain. Meanwhile neural modeling approaches require quality large-scale spoken-written pair exam ples in the same or similar domain as the ASR system (in-domain data), to train. Both these approaches require costly and complex annotation. In this paper, we present a data augmentation tech nique with neural machine translation that effectively generates rich spoken-written pairs for high and low resource languages effectively. We empirically demonstrate that ITN models (in tar get language) trained using our data augmentation with machine translation technique can achieve similar performance as ITN models (en) trained directly with in-domain language.