ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Finer-grained Modeling units-based Meta-Learning for Low-resource Tibetan Speech Recognition

Siqing Qin, Longbiao Wang, Sheng Li, Yuqin Lin, Jianwu Dang

Tibetan is a typical under-resourced language due to its relatively smaller population. Although a character-based end-to-end (E2E) automatic speech recognition (ASR) model with transfer learning and multilingual training strategies has mitigated the problem of low resources, it often meets overfitting problem. Recently meta-learning performs great in solving overfitting problem. However, the widely-used coarse-grained modeling units are not significantly correlated to their pronunciation, which limits the performance improvement of the low-resource ASR system. Furthermore, meta-learning consists of a meta-training period and fast self-adaption on the target language, and the past meta-training period is lack target language-specific information. Therefore, we propose a novel E2E low-resource Lhasa dialect ASR model based on the finer-grained modeling units and transfer learning with reference to the properties of Chinese Pinyin. Chinese Pinyin and Tibetan decomposed radicals are more related to pronunciation than characters are, which can compensate for more acoustic information in low-resource situations. Furthermore, Tibetan modeling units are utilized in both meta-training and fast self-adaption processes to offer language-specific information to solve the low-resource problem. Experiments show that our proposed method achieves a 54.9% relative character error reduction rate than the baseline system.