ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Methods for combining language models in speech recognition

Simo Broman, Mikko Kurimo

Statistical language models have a vital part in contemporary speech recognition systems and a lot of language models have been presented in the literature. The best results have been achieved when different language models have been used together. Several combination methods have been presented, but few comparisons of the different methods has been done.

In this work, three combination methods that have been used with language models are studied. In addition, a new approach based on likelihood density function estimation using histograms is presented. The methods are evaluated in speech recognition experiments and perplexity calculations. The test data consist of Finnish news articles and four language models work as the component models.

In the perplexity experiments, all combining methods produced statistically significant improvement compared to the 4-gram model that worked as a baseline. The best result, 46% improvement to the 4-gram model, was achieved when combining three language models together by using the new bin estimation method. In the speech recognition experiments, 4% reduction to the word error and over 7% reduction to the phoneme error was achieved by unigram rescaling method.