ISCA Archive WOCCI 2008
ISCA Archive WOCCI 2008

Language model for the web search task in a spoken dialogue system for children

Jumpei Miyake, Shota Takeuchi, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano

In this paper, we propose a method to improve the speech recognition accuracy for web search utterances to a spoken dialogue system. Speech data with a dialogue system are obtained by our speech-oriented information guidance system, ”Takemaru-kun” [1], which has been in operation at a public community center since November 2002. From the results of manual labeling of the utterances, child utterances account for about 80%. Most of the web search utterances are out-of-domain words, i.e. trendy words or proper nouns. In order to adapt it to a wider domain, we propose to expand the language model and the vocabulary by collecting from various web resources such as weblogs and open dictionaries. First, we analyze the characteristics of the adult and child web search utterances separately. Then, we make a comparative study of a variety of learning corpora for language model construction. Finally, comparison of the performance of the language models is conducted.