ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

A human-human train timetable dialogue corpus

Filip Jurcicek, Jiri Zahradil, Libor Jelinek

This paper describes progress in a development of the humanhuman dialogue corpus. The corpus contains transcribed user's phone calls to a train timetable information center. The phone calls consist of inquiries regarding their train traveler's plans. The corpus is based on dialogues's transcription of user's inquiries that were previously collected for a train timetable information center. We enriched this transcription by dialogue act tags. The dialogue act tags comprehend abstract semantic annotation. The corpus comprises a recorded speech of both operators and users, orthographic transcription, normalized transcription, normalized transcription with named entities, and dialogue act tags with abstract semantic annotation. A combination of a dialogue act tagset and a abstract semantic annotation is proposed. A technique of dialogue act tagging and abstract semantic annotation is described and used.