ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Adaptive topic - dependent language modelling using word - based varigrams

Sven C. Martin, Jörg Liermann, Hermann Ney

This paper presents two extensions of the standard interpolated word trigram and cache model, namely the extension of the trigram model by useful word m-grams with m > 3 resulting into a varigram model, and the addition of topic-specific trigram models. We give the criteria for selecting useful m-grams and for partitioning the training corpus into topic-specific subcorpora. We apply both extensions, separately and in combination, to corpora of 4 and 39 million words taken from the Wall Street Journal Corpus and show that high reductions in perplexity of up to 19 % on the largest corpus are achieved. We also performed some recognition experiments.