ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

An hybrid language model for a continuous dictation prototype

K. Smaili, I. Zitouni, F. Charpillet, Jean-Paul Haton

This paper describes the combination of a stochastic language model and a formal grammar modelled such as a unification grammar. The stochastic model is trained over 42 million words extracted from Le monde newspaper. The stochastic model is based on smoothed 3-gram and 3-class. The 3-class model is represented by a Markov chain made up of four states. Several experiments have been done to state which values are the best for specific training and test corpus. Experiments indicate that the unification grammar reduce strongly the number of hypothesis (sentences) produced by the stochastic model.