ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Speaker normalization and speaker adaptation - a combination for conversational speech recognition

Puming Zhan, Martin Westphal, Michael Finke, Alex Waibel

Speaker normalization and speaker adaptation are two strategies to tackle the variations from speaker, channel, and environment. The vocal tract length normalization (VTLN) is an effective speaker normalization approach to compensate for the variations of vocal tract shapes. The Maximum Likelihood Linear Regression(MLLR) is a recent proposed method for speaker-adaptation. In this paper, we propose a speaker-specific Bark scale VTLN method, investigate the combination of the VTLN with MLLR, and present an iterative procedure for decoding the combined system of VTLN and MLLR. The results show that: (1) the new VTLN method is very effective with which the word error rate can be reduced up to 11%; (2) the combination of VTLN and MLLR can provide up to 15% word error reduction; (3) both VTLN and MLLR are more effective for the push-to-talk data than for the cross-talk data.


doi: 10.21437/Eurospeech.1997-552

Cite as: Zhan, P., Westphal, M., Finke, M., Waibel, A. (1997) Speaker normalization and speaker adaptation - a combination for conversational speech recognition. Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997), 2087-2090, doi: 10.21437/Eurospeech.1997-552

@inproceedings{zhan97_eurospeech,
  author={Puming Zhan and Martin Westphal and Michael Finke and Alex Waibel},
  title={{Speaker normalization and speaker adaptation - a combination for conversational speech recognition}},
  year=1997,
  booktitle={Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997)},
  pages={2087--2090},
  doi={10.21437/Eurospeech.1997-552},
  issn={1018-4074}
}