ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Lip-reading from parametric lip contours for audio- visual speech recognition

Sabri Gurbuz, Eric K. Patterson, Zekeriya Tufekci, John N. Gowdy

This paper describes the incorporation of a visual lip tracking and lipreading algorithm that utilizes the affine-invariant Fourier descriptors from parametric lip contours to improve the audio-visual speech recognition systems. The audio-visual speech recognition system presented here uses parallel hidden Markov models (HMMs), where a joint decision, using an optimal decision rule, is made after processing. This work describes the extraction of affine-invariant Fourier descriptors (AI-FDs) from parametric lip contour data. Finally, this work validates the use of optimal weight selection, which is based on the noise type and signal-to-noise ratio (SNR) for joint audio-visual automatic speech recognition (JAV-ASR).