ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Improved bimodal speech recognition using tied-mixture HMMs and 5000 word audio-visual synchronous database

Satoshi Nakamura, Ron Nagai, Kiyohiro Shikano

This paper presents methods to improve speech recognition accuracy by incorporating automatic lip reading. The paper improves lip reading accu- racy by following approaches; 1)collection of image and speech synchronous data of 5240 words, 2)feature extraction of 2-dimensional power spectra around a mouth and 3)sub-word unit HMMs with tied-mixture distribution(Tied-Mixture HMMs). Experiments through 100 word test show the performance of 85% by lipreading alone. It is also shown that tied-mixture HMMs improve the lip reading accuracy. The speech recognition experiments are carried out over various SNR integrating audio-visual information. The results show the integration always realizes better performance than that using either audio or visual information.


doi: 10.21437/Eurospeech.1997-464

Cite as: Nakamura, S., Nagai, R., Shikano, K. (1997) Improved bimodal speech recognition using tied-mixture HMMs and 5000 word audio-visual synchronous database. Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997), 1623-1626, doi: 10.21437/Eurospeech.1997-464

@inproceedings{nakamura97b_eurospeech,
  author={Satoshi Nakamura and Ron Nagai and Kiyohiro Shikano},
  title={{Improved bimodal speech recognition using tied-mixture HMMs and 5000 word audio-visual synchronous database}},
  year=1997,
  booktitle={Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997)},
  pages={1623--1626},
  doi={10.21437/Eurospeech.1997-464},
  issn={1018-4074}
}