ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Detection of word fragments in Mandarin telephone conversation

Cheng-Tao Chu, Yun-Hsuan Sung, Yuan Zhao, Daniel Jurafsky

We describe preliminary work on the detection of word fragments in Mandarin conversational telephone speech. We extracted prosodic, voice quality, and lexical features, and trained Decision Tree and SVM classifiers. Previous research shows that glottalization features are instrumental in English fragment detection. However, we show that Mandarin fragments are quite different than English; 90% of Mandarin fragments are followed immediately by a repetition of the fragmentary word. These repetition fragments are not glottalized, and they have a very specific distribution; the 12 most frequent words ("you", "I", "that", "have", "then", etc.) cover 50% of the tokens of these fragments. Thus rather than glottalization, we found the most useful feature for Mandarin fragment detection was the identity of the neighboring character (word or morpheme). In an oracle experiment using the true (reference) neighboring words as well as prosodic and voice quality features, we achieved 80% accuracy in Mandarin fragment detection.

doi: 10.21437/Interspeech.2006-100

Cite as: Chu, C.-T., Sung, Y.-H., Zhao, Y., Jurafsky, D. (2006) Detection of word fragments in Mandarin telephone conversation. Proc. Interspeech 2006, paper 1730-Thu1CaP.9, doi: 10.21437/Interspeech.2006-100

  author={Cheng-Tao Chu and Yun-Hsuan Sung and Yuan Zhao and Daniel Jurafsky},
  title={{Detection of word fragments in Mandarin telephone conversation}},
  booktitle={Proc. Interspeech 2006},
  pages={paper 1730-Thu1CaP.9},