Language instructors spend a lot of effort providing feedback on spoken language tasks. The use of automated engines can be helpful in improving pronunciation for tonal languages like Mandarin. The study focused on evaluating prominent APIs for their ability to assess oral readings by L2 Mandarin learners, with particular emphasis on detecting lexical tones and pauses. In evaluating the recognition of lexical tones, the automated engines demonstrated the capacity to detect tones and provide a corresponding tonal pronunciation score. The study found that the overall accuracy of tonal diagnosis reached 80%, as determined through the calculation of false rejection and false acceptance rates. Furthermore, the rating distribution of the four lexical tones closely aligned with human ratings, thereby offering valuable insights for instructional strategies aimed at learners. Regarding the assessment of intonation and other prosodic features, the automated engines were able to generate scores for pauses and overall fluency. It was observed that these scores exhibited a correlation of 0.6 with human raters’ detection of unnatural prosodic boundaries. As a result, the study recommends the inclusion of more detailed descriptions of prosodic features in current Mandarin automated rating models to enrich feedback and learning opportunities for learners.
This research was supported by Suzhou Technological Innovation in Key Industries Project Grant Number SYG202030 and Chinese international Testing Research Project Grand Number CTI2021B09.