ISCA Archive SpeechProsody 2024
ISCA Archive SpeechProsody 2024

L2 Prosody Assessment by Combining Acoustic and Neural Model Features

Wenwei Dong, Roeland van Hout, Catia Cucchiarini, Helmer Strik

Computer-Assisted Language Learning systems are becoming increasingly popular, but most of the systems focus on the segmental level, while research on second language (L2) intelligibility emphasizes the important role of prosody. In this paper, we investigated possible methods to calculate L2 prosody scores automatically, using speechocean762, an L2 English corpus evaluated by experts at utterance-level for prosody, pronunciation accuracy and fluency. To develop an automatic L2 prosody assessment method, we first extracted 107 acoustic features, then applied regression analyses, followed by Lasso regression and Recursive Feature Elimination to select the most relevant features for prosody, fluency, and accuracy. We also explored a Kaldi-based acoustic model trained on native data to estimate L2 performance at the utterance level. The results showed that the combination of selected acoustic features and transformed Kaldi-based scores works best to predict the experts' evaluations. Prosodic features (loudness, duration, F0) are important for the prosodic evaluation, but also for fluency and accuracy. Other features play a role as well. Our outcomes show that L2 prosody is an important characteristic of L2 speech and that automatically obtained prosodic measures can be helpful in evaluating L2 performance.