ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Experiments in adaptation of language models for commercial applications

Petra Witschel, Harald Höge

To improve recognition accuracy for large vocabulary speech recognition systems we use language models based on linguistic classes (extended POS). In this paper an adaptation technique is presented, which profits from linguistic knowledge about unknown words of new domain. Switching from basis domain to new domain we keep the bigram probabilities of linguistic classes fixed and adapt only monograms of word probabilities. In our experiments we use three different corpora: financial columns of a newspaper corpus and two medical corpora (computer tomography and magnetic resonance). Adapted language models show an improvement of test-set perplexity of 48% to 51% compared to the case of putting unknown words into the language model "unknown" class.