ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Speaker-independent silent speech recognition with across-speaker articulatory normalization and speaker adaptive training

Jun Wang, Seongjun Hahm

Silent speech recognition (SSR) converts non-audio information (e.g., articulatory information) to speech. SSR has potential to enable laryngectomees to produce synthesized speech with a natural sounding voice. Despite its recent advances, current SSR research has largely relied on speaker-dependent recognition. High degree of variation in articulatory patterns across different talkers has been a barrier for developing effective speaker-independent SSR approaches. Speaker-independent approaches, however, are critical for reducing the large amount of training data required from each user; only limited articulatory samples are often available for individuals, due to the logistic difficulty of articulatory data collection. In this paper, we investigated speaker-independent silent speech recognition from tongue and lip movement data with two models that address the across-talker variation: Procrustes matching, a physiological approach, to minimize the across-talker physiological differences of articulators, and speaker adaptive training, a data-driven approach. A silent speech data set was collected using an electromagnetic articulograph (EMA) from five English speakers (while they were silently articulating phrases) and was used to evaluate the two speaker-independent SSR approaches. The long-standing Gaussian mixture model-hidden Markov models and recently available deep neural network-hidden Markov model were used as the recognizers. Experimental results showed the effectiveness of both normalization approaches.