ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Fast and Efficient Multilingual Self-Supervised Pre-training for Low-Resource Speech Recognition

Zhilong Zhang, Wei Wang, Yanmin Qian

Recent advances in self-supervised learning (SSL) have remarkably improved speech recognition performance for low-resource languages. On the other hand, with data of an increasingly larger scale required for SSL, the pretraining process has become extremely time-consuming. To address this problem, we propose an unsupervised data selection method based on utterance-level language similarity and a curriculum learning strategy to boost the efficiency of multilingual SSL pretraining while maintaining performance. We conduct experiments on five languages in COMMONVOICE dataset. Compared to the baseline with all data for pretraining, we pretrained on only 25% of the data and saved 60% of the training steps with even better performance on the target low-resource language.