ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Automatic Speech Recognition for Low-Resourced Middle Eastern Languages

Razhan Hameed, Sina Ahmadi, Hanah Hadi, Rico Sennrich

Despite significant advancements in language and speech technologies, many languages in the Middle East remain under-served, leading to a technological disparity that negatively impacts these languages. This paper presents a pioneering effort to address this issue by focusing on speech technologies for low-resourced languages in the Middle East. We introduce a community-driven volunteer-based initiative to collect audio recordings for six languages spoken by an estimated population of 30 million speakers. Through this initiative, we collect over 40 hours of speech data, with 75% of utterances based on multilingual parallel corpora. In our experiments, we demonstrate the impact of data collection and fine-tuning models on the performance of speech technologies for these languages. This research serves as a crucial step towards preserving and promoting linguistic diversity in the Middle East while ensuring equal access to speech technologies for all language communities.