ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Intensity- and location-normalized training for HMM-based visual speech recognition

Yoshihiko Nankaku, Keiichi Tokuda, Tadashi Kitamura

This paper describes an approach to estimating the pa-rameters of continuous density HMMs for visual speech recognition. One of the key issues of image-based vi-sual speech recognition is normalization of lip location and lighting condition prior to estimating the parameters of HMMs. We present an average-intensity and loca-tion normalized training method, in which the normalization process is integrated in the model training. The proposed method provides a theoretically-well-defined algorithm based on a maximum likelihood formulation, hence the likelihood for the training data is guaranteed to increase at each iteration of the normalized training. Experimental results show that the recognition performance can be significantly improved by the normalized training.