ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Ultrax: an animated midsagittal vocal tract display for speech therapy

Korin Richmond, Steve Renals

Speech sound disorders (SSD) are the most common communication impairment in childhood, and can unfortunately hamper social development and learning. Current speech therapy interventions must rely predominantly on the auditory skills of the child, as little technology is available to assist in diagnosis and therapy of SSDs. Realtime visualisation of tongue movements would bring enormous benefit. An ultrasound scanner offers this possibility, though its display has certain limitations which may make it hard to interpret. Our ultimate goal is to address these deficiencies: to exploit ultrasound to track tongue movement, but to display a simplified, diagrammatic vocal tract that is easier to interpret. In this paper, we first outline our general approach to this problem, which combines a latent space model with a dimensionality reducing model of vocal tract shapes. Then, we present pilot work to assess the feasibility of this approach. Specifically, we use MRI scans to train a model of vocal tract shapes, then attempt to animate that model using electromagnetic articulography (EMA) data from the same speaker. Piloting with EMA data is an intermediate step. It is simpler than using ultrasound, but still provides valuable insight. Based on these initial experiments, we argue the approach is promising.

Index Terms: Ultrasound, speech therapy, vocal tract visualisation