ISCA Archive Eurospeech 1993
ISCA Archive Eurospeech 1993

Issues in large scale statistical language modeling

R. Zhao, P. Kenny, P. Labute, Douglas O'Shaughnessy

We present our approach to three major issues in statistical language modeling with large vocabularies using large corpora: (1) word probability smoothing, (2) task adaptation and (3) access speed to word probabilities and memory management. The last issue is particularly important for real-time speech recognition systems. To handle smoothing and adaptation, we designed a nondeterministic trigram language model. We approach the last issue by caching.

Keywords: deterministic language model, dynamic adaptation, hidden Markov model, nondeterministic language model, static adaptation, statistical language modeling, trigram language modeling.