We present our approach to three major issues in statistical language modeling with large vocabularies using large corpora: (1) word probability smoothing, (2) task adaptation and (3) access speed to word probabilities and memory management. The last issue is particularly important for real-time speech recognition systems. To handle smoothing and adaptation, we designed a nondeterministic trigram language model. We approach the last issue by caching.
Keywords: deterministic language model, dynamic adaptation, hidden Markov model, nondeterministic language model, static adaptation, statistical language modeling, trigram language modeling.