ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Improving out-of-coverage language modelling in a multimodal dialogue system using small training sets

Louis ten Bosch

For automatic speech recognition, the construction of an adequate language model may be difficult when only a limited amount of training text is available. Previous work has shown that in the case of small training sets statistical language models may outperform grammars on out-of-coverage utterances, while showing comparable performance on in-coverage input. In this paper, we compare the performance of an automatic speech recognition system using a grammar and a statistical language model including garbage models in the case of very limited in-domain training data. The results show that a bigram language model and a grammar show similar performance, and that the inclusion of garbage models in statistical language models enhances their performance both on in-coverage and out-of-coverage utterances.