ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Orthography-based Pronunciation Scoring for Better CAPT Feedback

Caitlin Richter, Ragnar Pálsson, Luke O'Brien, Kolbrún Friðriksdóttir, Branislav Bédi, Eydís Huld Magnúsdóttir, Jón Guðnason

We establish the viability of a streamlined architecture for pedagogically appropriate computer assisted pronunciation training (CAPT), to give second language learners automatic feedback about their mispronunciations. This takes advantage of end-to-end speech recognition models to detect mispronunciation in audio segments that correspond directly to orthographic letters, in contrast to standard mispronunciation detection using phone representations. Results in a classification task show the potential for similar sensitivity to non-nativelike phonetic errors in grapheme-aligned segments as in phone-aligned segments. Advantages of this approach over phone-based pronunciation scoring can include providing naturally comprehensible (orthographic, not phonemic) feedback to learners, being inherently open-vocabulary in the target language, and evaluating pronunciations with reference to a full range of target-language acoustic variants rather than a prespecified canonical phone sequence.


doi: 10.21437/Interspeech.2023-2577

Cite as: Richter, C., Pálsson, R., O'Brien, L., Friðriksdóttir, K., Bédi, B., Magnúsdóttir, E.H., Guðnason, J. (2023) Orthography-based Pronunciation Scoring for Better CAPT Feedback. Proc. INTERSPEECH 2023, 1004-1008, doi: 10.21437/Interspeech.2023-2577

@inproceedings{richter23b_interspeech,
  author={Caitlin Richter and Ragnar Pálsson and Luke O'Brien and Kolbrún Friðriksdóttir and Branislav Bédi and Eydís Huld Magnúsdóttir and Jón Guðnason},
  title={{Orthography-based Pronunciation Scoring for Better CAPT Feedback}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={1004--1008},
  doi={10.21437/Interspeech.2023-2577},
  issn={2308-457X}
}