ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Estimation of the air-tissue boundaries of the vocal tract in the mid-sagittal plane from electromagnetic articulograph data

Satyabrata Parida, Ashok Kumar Pattem, Prasanta Kumar Ghosh

Electromagnetic articulograph (EMA) provides movement data of sensors attached to a few flesh points on different speech articulators including lips, jaw, and tongue while a subject speaks. In this work, we quantify the amount of information these flesh points provide about the vocal tract (VT) shape in the mid-sagittal plane. VT shape is described by the air-tissue boundaries, which are obtained manually from the recordings by real-time magnetic resonance imaging (rtMRI) of a set of utterances spoken by a subject, from whom the EMA recordings of the same set of utterances are also available. We propose a two-stage approach for reconstructing the VT shape from the EMA data. The first stage involves a co-registration of the EMA data with the VT shape from the rtMRI frames. The second stage involves the estimation of the air-tissue boundaries from the co-registered EMA points. Co-registration is done by a spatio-temporal alignment of the VT shapes from the rtMRI frames and EMA sensor data, while radial basis function (RBF) network is used for estimating the air-tissue boundaries (ATBs). Experiments with the EMA and rtMRI recordings of five sentences spoken by one male and one female speakers show that the VT shape in the mid-sagittal plane can be recovered from the EMA flesh points with an average reconstruction error of 2.55 mm and 2.75 mm respectively.