ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Growing an n-gram language model

Vesa Siivola, Bryan L. Pellom

Traditionally, when building an n-gram model, we decide the span of the model history, collect the relevant statistics and estimate the model. The model can be pruned down to a smaller size by manipulating the statistics or the estimated model. This paper shows how an n-gram model can be built by adding suitable sets of n-grams to a unigram model until desired complexity is reached. Very high order n-grams can be used in the model, since the need for handling the full unpruned model is eliminated by the proposed technique. We compare our growing method to entropy based pruning. In Finnish speech recognition tests, the models trained by the growing method outperform the entropy pruned models of similar size.

doi: 10.21437/Interspeech.2005-24

Cite as: Siivola, V., Pellom, B.L. (2005) Growing an n-gram language model. Proc. Interspeech 2005, 1309-1312, doi: 10.21437/Interspeech.2005-24

  author={Vesa Siivola and Bryan L. Pellom},
  title={{Growing an n-gram language model}},
  booktitle={Proc. Interspeech 2005},