ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Towards End-to-End Unified Recognition for Mandarin and Cantonese

Meiling Chen, Pengjie Liu, Heng Yang, Haofeng Wang

Constructing an automatic speech recognition (ASR) system that supports Mandarin and Cantonese is demanding and challenging. The method of pre-training two speech recognition models and then selecting a specific model for recognition through extra means is resource-consuming and complex. This paper presents an end-to-end system for unified Mandarin-Cantonese recognition and a complete model training method in scenarios where high-resource and low-resource languages coexist, while reducing complexity. The impact of different modeling units on character error rate (CER) and training efficiency was also studied. Besides, this system incorporates a language identification (LID) module to reduce context confusion during recognition. Experiments show that compared to Mandarin-only and Cantonese-only models, our system achieves 12.71% and 21.23% relative CER reduction for Mandarin and Cantonese respectively and training efficiency can be doubled.