ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Improved bimodal speech recognition using tied-mixture HMMs and 5000 word audio-visual synchronous database

Satoshi Nakamura, Ron Nagai, Kiyohiro Shikano

This paper presents methods to improve speech recognition accuracy by incorporating automatic lip reading. The paper improves lip reading accu- racy by following approaches; 1)collection of image and speech synchronous data of 5240 words, 2)feature extraction of 2-dimensional power spectra around a mouth and 3)sub-word unit HMMs with tied-mixture distribution(Tied-Mixture HMMs). Experiments through 100 word test show the performance of 85% by lipreading alone. It is also shown that tied-mixture HMMs improve the lip reading accuracy. The speech recognition experiments are carried out over various SNR integrating audio-visual information. The results show the integration always realizes better performance than that using either audio or visual information.