ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Text normalization and speech recognition in French

Gilles Adda, Martine Adda-Decker, Jean-Luc Gauvain, Lori Lamel

In this paper we present a quantitative investigation into the impact of text normalization on lexica and language models for speech recognition in French. The text normalization process defines what is considered to be a word by the recognition system. Depending on this definition we can measure different lexical coverages and language model perplexities, both of which are closely related to the speech recognition accuracies obtained on read news-paper texts. Different text normalizations of up to 185M words of newspaper texts are presented along with corresponding lexical coverage and perplexity measures. Some normalizations were found to be necessary to achieve good lexical coverage, while others were more or less equivalent in this regard. The choice of normalization to create language models for use in the recognition experiments with read newspaper texts was based on these findings. Our best system configuration obtained a 11.2% word error rate in the AUPELF 'French-speaking' speech recognizer evaluation test held in February 1997.

doi: 10.21437/Eurospeech.1997-684

Cite as: Adda, G., Adda-Decker, M., Gauvain, J.-L., Lamel, L. (1997) Text normalization and speech recognition in French. Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997), 2711-2714, doi: 10.21437/Eurospeech.1997-684

  author={Gilles Adda and Martine Adda-Decker and Jean-Luc Gauvain and Lori Lamel},
  title={{Text normalization and speech recognition in French}},
  booktitle={Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997)},