We establish the viability of a streamlined architecture for pedagogically appropriate computer assisted pronunciation training (CAPT), to give second language learners automatic feedback about their mispronunciations. This takes advantage of end-to-end speech recognition models to detect mispronunciation in audio segments that correspond directly to orthographic letters, in contrast to standard mispronunciation detection using phone representations. Results in a classification task show the potential for similar sensitivity to non-nativelike phonetic errors in grapheme-aligned segments as in phone-aligned segments. Advantages of this approach over phone-based pronunciation scoring can include providing naturally comprehensible (orthographic, not phonemic) feedback to learners, being inherently open-vocabulary in the target language, and evaluating pronunciations with reference to a full range of target-language acoustic variants rather than a prespecified canonical phone sequence.
Cite as: Richter, C., Pálsson, R., O'Brien, L., Friðriksdóttir, K., Bédi, B., Magnúsdóttir, E.H., Guðnason, J. (2023) Orthography-based Pronunciation Scoring for Better CAPT Feedback. Proc. INTERSPEECH 2023, 1004-1008, doi: 10.21437/Interspeech.2023-2577
@inproceedings{richter23b_interspeech, author={Caitlin Richter and Ragnar Pálsson and Luke O'Brien and Kolbrún Friðriksdóttir and Branislav Bédi and Eydís Huld Magnúsdóttir and Jón Guðnason}, title={{Orthography-based Pronunciation Scoring for Better CAPT Feedback}}, year=2023, booktitle={Proc. INTERSPEECH 2023}, pages={1004--1008}, doi={10.21437/Interspeech.2023-2577}, issn={2308-457X} }