ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Large Span statistical language models: application to homophone disambiguation for large vocabulary speech recognition in French

Frédéric Béchet, Alexis Nasr, Thierry Spriet, Renato de Mori

Homophone words is one of the specific problems of Automatic Speech Recognition (ASR) in French. Moreover, this phenomenon is particularly high for some inflections like the singular/plural inflection (72% of the 40.7K lemma of our 240K word dictionary have inflected forms which are homophonic). In order to take into account word-dependencies spanning over a variable number of words, it is interesting to merge local language models, like 3-gram or 3-class models, with large-span models. We present in this paper two kinds of models : a phrase-based model, using phrases obtained from a training corpus by means of a finitestate parser; a homophone cache-based model, using derivation of constraints from word histories stored in a cache memory.