This paper proposes a new speaker adaptation method that uses speaker-independent HMMs as initial models, and it emphasizes the feature distribution of the target speaker's speech during adaptation training. A mixture Gaussian HMM with a large number of distributions attains good recognition accuracy for speaker-independent speech recognition when sufficient training speech is available. The proposed method uses such speaker-independent mixture Gaussian HMMs as initial models, and modifies mixture coefficients to maximize the likelihood for the target speaker. This method does not require phoneme segmentation and labeling of training speech, although it uses supervised training.
The adapted models using this method were evaluated by comparing with speaker-independent and speaker-dependent models. When the number of training words was less than 200, the speaker-adapted model achieved better recognition than either the speaker-independent or the speaker-dependent models.