Dialogue-state dependent language models in automatic inquiry systems can be employed to improve speech recognition and understanding. In this paper, the dialogue state is defined by the set of parameters contained in the system prompt. Using this knowledge, a separate language model for each state can be constructed.
In order to obtain robust language models we study the linear interpolation of all dialogue-state dependent language models and an automatic text clustering algorithm. In particular, we extend the clustering algorithm so as to automatically determine the optimal number of clusters. These clusters are then be combined with linear interpolation.
We present experimental results on a Dutch corpus which has been recorded in the Netherlands with a train timetable information system in the framework of the ARISE project [1]. The perplexity, the word error rate, and the attribute error rate can be reduced significantly with all of these methods.