Constructing an automatic speech recognition (ASR) system that supports Mandarin and Cantonese is demanding and challenging. The method of pre-training two speech recognition models and then selecting a specific model for recognition through extra means is resource-consuming and complex. This paper presents an end-to-end system for unified Mandarin-Cantonese recognition and a complete model training method in scenarios where high-resource and low-resource languages coexist, while reducing complexity. The impact of different modeling units on character error rate (CER) and training efficiency was also studied. Besides, this system incorporates a language identification (LID) module to reduce context confusion during recognition. Experiments show that compared to Mandarin-only and Cantonese-only models, our system achieves 12.71% and 21.23% relative CER reduction for Mandarin and Cantonese respectively and training efficiency can be doubled.