The objective of this study is to understand the relative importance of different components of speech that contribute to perception of emotion in speech. The four components considered in this study relate to the vocal tract system, excitation source and suprasegmental (pitch and duration) information. For this study, data collected from an artist, producing speech with different emotions is used. A flexible analysis-synthesis tool is used to modify the parameters of speech in a desired manner. Results of subjective studies show that all the four components are important in perceiving emotion in an utterance in comparison to the corresponding neutral utterance. Individually, the pitch contour seems to be the dominant component, and the duration seems to play less a significant role. It is also interesting to note that the importance of these components vary in perception for different types of emotions.
Index Terms: speech prosody, speech analysis, speech synthesis, emotion conversion, dynamic time warping.