ISCA Archive SpeechProsody 2008
ISCA Archive SpeechProsody 2008

Some non-F0 cues to emotional speech: an experiment with morphing

Donna Erickson, Takaaki Shochi, Caroline Menezes, Hideki Kawahara, Ken-Ichi Sakakibara

This paper investigates some non-F0 cues to emotional speech. Two speech samples were collected from spontaneous speech: the word "leave" - one sample spoken with emotion (sad) and the other, as not-emotional. Using the morphing algorithm of STRAIGHT [1], we morphed a series of 12 utterances, starting from the non-emotional "leave" to the emotional "leave", keeping F0 at 300 Hz. Perception test results show that the morphed speech sounds could be identified as sad, with stimulus 12 being heard as most emotional. The results of a simple correlation, together with a PCA analysis of listenersÂ’ perceptual behavior, suggest that formant frequencies, specifically, lowering F2, F3, and F4 are important cues for perception of emotional (sad) speech.

Kawahara, H.; Matsui, H., 2003. Auditory morphing based on an elastic perceptual distance metric in an interference-free time-frequency representation. Proc. IEEE ICASSP, 2003.

doi: 10.21437/SpeechProsody.2008-149

Cite as: Erickson, D., Shochi, T., Menezes, C., Kawahara, H., Sakakibara, K.-I. (2008) Some non-F0 cues to emotional speech: an experiment with morphing. Proc. Speech Prosody 2008, 677-680, doi: 10.21437/SpeechProsody.2008-149

  author={Donna Erickson and Takaaki Shochi and Caroline Menezes and Hideki Kawahara and Ken-Ichi Sakakibara},
  title={{Some non-F0 cues to emotional speech: an experiment with morphing}},
  booktitle={Proc. Speech Prosody 2008},