ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

UniSplice: Universal Cross-Lingual Data Splicing for Low-Resource ASR

Wei Wang, Yanmin Qian

End-to-end (E2E) automatic speech recognition (ASR) has made remarkable progress thanks to the abundant annotated data for a few rich-resource languages. However, data scarcity remains a challenge for the majority of the world’s languages. To address this issue, we propose UniSplice, a novel cross-lingual speech synthesis framework based on data splicing that leverages self-supervised learning (SSL) units from Hidden Unit BERT (HuBERT) as universal phonetic units. Our approach involves splicing speech fragments from rich-resource languages into complete speech that conforms acoustically to text from low-resource languages. UniSplice eliminates the need for computationally expensive neural text-to-speech (TTS) models, enabling the training of ASR models using on-the-fly synthesized speech. Experimental results on the COMMON-VOICE dataset show 20-30% relative improvement for four Indo-European languages and about 15% for Turkish with a 4-gram language model for rescoring, in a 10-hour low-resource setup.