ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

From Scarcity to Sufficiency: Speech Recognition Pipeline for Zero-resource Language

Nikolay Karpov, Sofia Kostandian, Nune Tadevosyan, Alexan Ayrapetyan, Andrei Andrusenko, Ara Yeroyan, Mher Yerznkanyan, Vitaly Lavrukhin

The quality of Automatic Speech Recognition (ASR) systems largely depends on the availability of training data, which is predominantly accessible for either high-resource or low-resource languages. In contrast, languages such as Armenian face significant challenges due to the almost zero availability of public speech and text corpora. In this paper, we introduce a comprehensive framework that elevates data availability for a zero-resource language to a new level, thereby enabling the development of a fully operational online ASR model. Our approach involves data collection and processing through diverse resources, including audiobooks, paid crowdsourcing, and leveraging the volunteer platform to assemble a labeled dataset totaling 149 hours. This data made it possible to apply pseudo-labeling techniques on additional 145 hours of public audio data, achieving a new state-of-the-art Word Error Rate (WER) of 9.90% on Common Voice test. All datasets and ASR models are open-sourced.