A posteriori multiple word-domain language model

Elvira I. Sicilia-Garcia, Ji Ming, F. Jack Smith

It is shown that the enormous improvement in the size of disk storage space in recent years can be used to build multiple worddomain statistical language models, one for each significant word of a language. Each of these word-domain language models is a precise domain model for the relevant significant word and when combined appropriately they provide a highly specific domain language model for the language following a cache, even a short cache. A Multiple Word- Domain model based on 20,000 individual word language models has been constructed and tested on a Wall Street Journal Corpus. Improvements in perplexity, between 25% and 68%, over a base-line tri-gram model have been obtained in tests.

doi: 10.21437/Interspeech.2005-18

Cite as: Sicilia-Garcia, E.I., Ming, J., Smith, F.J. (2005) A posteriori multiple word-domain language model. Proc. Interspeech 2005, 1285-1288, doi: 10.21437/Interspeech.2005-18

