ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Towards Accurate Phonetic Error Detection Through Phoneme Similarity Modeling

Xuanru Zhou, Jiachen Lian, Cheol Jun Cho, Tejas Prabhune, Shuhe Li, William Li, Rodrigo Ortiz, Zoe Ezzes, Jet Vonk, Brittany Morin, Rian Bogley, Lisa Wauters, Zachary Miller, Maria Gorno-Tempini, Gopala Anumanchipalli

Phonetic error detection, as a core subtask of automatic pronunciation assessment, aims to identify pronunciation deviations at the fine-grained phoneme level. However, variability in both speech production and perception, including accents, and dysfluencies, presents a significant challenge for phoneme recognition. Current models are unable to capture these discrepancies effectively. In this work, we propose a framework for verbatim phoneme recognition, employing multi-task training with a novel phoneme similarity modeling. Unlike most previous studies that focus on transcribing what the person is supposed to say, our method aims to transcribe what the person actually said. We develop a simulated dataset VCTK-accent contains phonetic errors, which is open-sourced, and propose two novel metrics for assessing pronunciation differences. Our work provides a new benchmark for the phonetic error detection task.