ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Multilingual Speech Assessment Using Cross-Attention and Multitask Learning

Sehyun Oh, Minhwa Chung, Sunhee Kim

Automatic speech assessment plays a vital role in language learning by providing essential feedback on pronunciation, fluency, and overall speaking ability. However, developing effective multilingual speech assessment systems poses significant challenges with the complexity of modeling multiple languages and limited availability of labeled data, especially for languages other than English. In this study, we propose a multilingual speech assessment system for three languages -- English, German, and French, which are produced by Korean learners. Enhanced by cross-attention and multitask learning mechanisms, our model utilizes pre-trained models to capture both language-specific and cross-linguistic features, predicting overall speaking proficiency scores directly from raw speech audio. Experimental results demonstrate that our proposed method, especially with wav2vec 2.0, presents superior performance on both seen and unseen data compared to monolingual models.