ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Tongue tracking in ultrasound images using eigentongue decomposition and artificial neural networks

Diandra Fabre, Thomas Hueber, Florent Bocquelet, Pierre Badin

This paper describes a machine learning approach for extracting automatically the tongue contour in ultrasound images. This method is developed in the context of visual articulatory biofeedback for speech therapy. The goal is to provide a speaker with an intuitive visualization of his/her tongue movement, in real-time, and with minimum human intervention. Contrary to most widely used techniques based on active contours, the proposed method aims at exploiting the information of all image pixels to infer the tongue contour. For that purpose, a compact representation of each image is extracted using a PCA-based decomposition technique (named EigenTongue). Artificial neural networks are then used to convert the extracted visual features into control parameters of a PCA-based tongue contour model. The proposed method is evaluated on 9 speakers, using data recorded with the ultrasound probe hold manually (as in the targeted application). Speaker-dependent experiments demonstrated the effectiveness of the proposed method (with an average error of ~1.3 mm when training from 80 manually annotated images), even when the tongue contour is poorly imaged. The performance was significantly lower in speaker-independent experiments ( i.e. when estimating contours on an unknown speaker), likely due to anatomical differences across speakers.