ISCA Archive SynData4GenAI 2024
ISCA Archive SynData4GenAI 2024

Language Technology for All: Leveraging Foundational Speech Models to Empower Low-Resource Languages

Sakriani Sakti

The development of advanced spoken language technologies, such as automatic speech recognition (ASR) and text-to-speech synthesis (TTS), has enabled computers to listen and speak. While many applications and services are now available, they only fully support fewer than 100 languages. Although recent research might listen to up to 1,000 languages, there are still more than 6,000 living languages spoken by 350 million people uncovered. This gap exists because most systems are constructed using supervised machine learning, which requires large amounts of paired speech and corresponding transcriptions.

In this talk, I will introduce several successful approaches that aim to achieve language technology for all by leveraging foundational speech models to support linguistic diversity with less-resourced data. These approaches include self-supervised learning, visually grounded models, and the machine speech chain. I will also share insights and feedback from the indigenous community, gathered from events, workshops, and panel discussions over the years. The challenges are not only how to construct language technologies for language diversity, but also how to ensure that these technologies are truly beneficial to under-resourced language communities.