ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

A Korean speech corpus for train ticket reservation aid system based on speech recognition

Woosung Kim, Myoung-Wan Koo

This paper describes the Korean speech corpus for train ticket reservation aid system based on speech recognition. Two sets of speech corpus were collected. One was based on human-human(H-H) dia- logues and the other was based on human-computer(H- C) dialogues. WOZ(Wizard of Oz) experiment was carried out to collect speech corpus based on H-C spoken dialogue. A total of 298 speaker data was col- lected for H-C corpus and a total of 100 speaker data was collected for H-H corpus. Since the basic unit of grammar in Korean is a morpheme, Korean-language model based on a morpheme was designed in addition to a word-based language model. Linguistic analysis results show that people respond differently when they are talking to a computer compared to when talking to a human. Also language-model analysis results reveal that a morpheme-based language model gives 50% reduction in perplexity(PP) over a word-based one.