ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Exploring a classification approach using quantised articulatory movements for acoustic to articulatory inversion

Jesuraj Bandekar, Sathvik Udupa, Prasanta Kumar Ghosh

Acoustic to articulatory inversion (AAI) is the task of predicting articulatory movements from speech acoustics. An AAI model is typically optimised with regression objectives on continuous articulatory targets. In this work, we explore an alternate approach of classifying bins of quantised articulatory movements. We extend it by utilising ordinal regression, along with a novel approach involving KL Divergence loss between a target Gaussian posterior and predicted. We train transformer AAI models with MFCC and TERA acoustic features, with various quantisation types (uniform vs nonuniform) and bins. Using 16 subjects’ acousticarticulatory data, we evaluate the results with correlation coefficient (CC) and root mean squared error on unseen utterances from seen and unseen speakers. While the quantization type didn't alter the performance, we find that the highest CC (0.8838) is achieved with TERA using ordinal regression, also with the proposed KL divergence loss, which is found to be on par with the CC (0.8856) using regression baseline. Reducing the number of quantisation bins to 16 does not change the performance.