ISCA Archive SpeechProsody 2008
ISCA Archive SpeechProsody 2008

Korean MULTEXT: a Korean prosody corpus

Sunhee Kim, Daniel Hirst, Hyongsil Cho, Ho-Young Lee, Minhwa Chung

This paper describes the contents of the Korean prosody corpus (Korean MULTEXT), which is a Korean version of the speech database Eurom1. The corpus consists of about 2 hours of read speech, transcribed primarily in orthography (in Korean alphabet and in a Romanized transcription), in IPA and in SAMPA. Furthermore, it includes the original F0 values, stylized F0 values extracted using Momel, and hand-corrected F0 values. The prosodic events are annotated in two ways. They are annotated with the automatic annotation algorithm, INTSINT, and also labeled manually into prosodic units with two tones on the hand-corrected pitch targets. It is found that the resulting tone patterns from the proposed Momel-based two tone labeling correspond to those defined in K-ToBI.

doi: 10.21437/SpeechProsody.2008-33

Cite as: Kim, S., Hirst, D., Cho, H., Lee, H.-Y., Chung, M. (2008) Korean MULTEXT: a Korean prosody corpus. Proc. Speech Prosody 2008, 139-142, doi: 10.21437/SpeechProsody.2008-33

  author={Sunhee Kim and Daniel Hirst and Hyongsil Cho and Ho-Young Lee and Minhwa Chung},
  title={{Korean MULTEXT: a Korean prosody corpus}},
  booktitle={Proc. Speech Prosody 2008},