ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language

Yi-hao Kao, Lin-shan Lee

Emotion recognition from speech signals is regarded as a critical step toward intelligent human-machine interface. However, feature parameters useful for this purpose may have to do with the special structures of the language. In this paper we present a detailed analysis of the feature parameters for emotion recognition considering the characteristics of the Chinese language, primarily the monosyllable structure and the tone behavior. The analysis is based on the feature parameters on three levels: frame-level, syllable-level, and word-level. The results show that the frame-level and syllable-level ones are good indicators, while taking the ensemble features on all three levels can yield a recognition accuracy of 90.0%. We also found that the pitch and power related features are the most important, and the fourth tone in Mandarin serves as the strongest indicator to emotions. All these findings are consistent with the characteristics of Mandarin Chinese.