This paper describes an approach to estimating the pa-rameters of continuous density HMMs for visual speech recognition. One of the key issues of image-based vi-sual speech recognition is normalization of lip location and lighting condition prior to estimating the parameters of HMMs. We present an average-intensity and loca-tion normalized training method, in which the normalization process is integrated in the model training. The proposed method provides a theoretically-well-defined algorithm based on a maximum likelihood formulation, hence the likelihood for the training data is guaranteed to increase at each iteration of the normalized training. Experimental results show that the recognition performance can be significantly improved by the normalized training.