ISCA Archive SLaTE 2023
ISCA Archive SLaTE 2023

End-to-End Mispronunciation Detection and Diagnosis for Non-native English Speech

Wenwei Dong, Catia Cucchiarini, Helmer Strik

Researchers normally use native data to help develop Mispronunciation Detection and Diagnosis (MD&D) models. However, the models trained on native data, which contains few mispronunciations, tend to ignore pronunciation errors, which might be problematic for the MD&D task. We propose three methods to reduce the mismatch between native models and the MD&D task. First, we randomly replaced a fixed percentage of phones that were error-prone for non-native speakers to adapt the model. Second, we added another Connectionist Temporal Classification (CTC) module to the baseline model, which has smaller classification units than the original CTC and is used to focus on identifying error-prone phones. Third, we further narrowed down the MD&D decoding paths. The results show that, compared to the baseline, the F1 score of the first method improved by 3.38%. The second and third methods can improve the F1 score by 2.12% and 2.95% respectively. The final F1 was improved by 5.39% by methods fusion.