ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Disentangling the Contribution of Non-native Speech in Automated Pronunciation Assessment

Shuju Shi, Kaiqi Fu, Yiwei Gu, Xiaohai Tian, Shaojun Gao, Wei Li, Zejun Ma

This study explores the impact of using non-native speech data in acoustic model training for pronunciation assessment systems. The goal is to determine how introducing non-native data in acoustic model training can influence alignment accuracy and assessment performance. Acoustic models are trained using different combinations of native and non-native speech data, and the Goodness of Pronunciation (GOP) metric is used to evaluate performance. Results show that models trained with manually labeled non-native data yield the highest assessment performance and alignment accuracy. Models trained with mixed non-native and native data perform best when considering the GOP distribution on both non-native and native speech. Additionally, models trained with native data are more robust to alignment variations. These findings highlight the importance of carefully selecting and incorporating non-native data in acoustic model training for pronunciation assessment systems.