ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Source normalization training for HMM applied to noisy telephone speech recognition

Yifan Gong

We refer to environment e as some combination of speaker, handset, transmission channel and noise background condition, and regard any practical situation of a speech recognizer as a mixture of environments. A speech recognizer may be trained on multi-environment data. It may also need to adapt the trained acoustic models to new conditions. How to train an HMM with multi-environment data and from what seed model to start an adaptation are two questions of great importance. We propose a new solution to speech recognition which is based on, for both training and adaptation, a separate modeling of phonetic variation and environment variations. This problem is formulated under hidden Markov process, where we assume, - Speech x is generated by some canonical (independent ofenvironmental factors) distributions, - An unknown linear transformation We and a bias be, specific to environment e, is applied to x with probability P(e), - x cannot be observed, what we observe is the outcome of the transformation: o = Wex + be. Under maximum-likelihood (ML) criterion, by application of EM algorithm and the extension of Baum's forward and backward variables and algorithm, we obtained a joint solution to the parameters of the canonical distributions, the transformations and the biases, which is novel. For special cases, on a noisy telephone speech database, the new formulation is compared to per-utterance cepstral mean normalization (CMN) technique and shows more than 20% word error rate improvement.


doi: 10.21437/Eurospeech.1997-447

Cite as: Gong, Y. (1997) Source normalization training for HMM applied to noisy telephone speech recognition. Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997), 1555-1558, doi: 10.21437/Eurospeech.1997-447

@inproceedings{gong97_eurospeech,
  author={Yifan Gong},
  title={{Source normalization training for HMM applied to noisy telephone speech recognition}},
  year=1997,
  booktitle={Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997)},
  pages={1555--1558},
  doi={10.21437/Eurospeech.1997-447},
  issn={1018-4074}
}