ISCA Archive SLaTE 2023
ISCA Archive SLaTE 2023

Towards Acoustic-to-Articulatory Inversion for Pronunciation Training

Charles G McGhee, Katherine M Knill, Mark Gales

Visual feedback of articulators using ElectromagneticArticulography (EMA) has been shown to aid acquisition of non-native speech sounds. Using physical EMA sensors is expensive and invasive making it impractical for providing real-world pronunciation feedback. Our work focuses on using neural Acoustic-to-Articulatory Inversion (AAI) models to map speech directly to EMA sensor positions. Self-Supervised Learning (SSL) speech models, such as HuBERT, can produce representations of speech that have been shown to significantly improve performance on AAI tasks. Probing experiments have indicated that certain layers and iterations of SSL models produce representations that may yield better inversion performance than others. In this paper, we build on these probing results to create an AAI model that improves upon a state-of-the-art baseline inversion model and evaluate the model’s suitability for pronunciation training.