Using i-vector space features has been shown to be very successful in speaker and language identification. In this paper, we evaluate using the i-vector framework for emotion recognition from speech. Instead of using standard i-vector features, we propose to use concatenated emotion specific i-vector features. For each emotion category, a GMM supervector is generated via adaptation of the neural one from a big corpus. An i-vector feature vector is then obtained using each emotion specificGMMsupervector. The concatenation of these emotion dependent i-vector features is used as the feature vector in the SVM model for emotion classification. Our experimental results on acted and spontaneous data sets demonstrate that our proposed method outperforms baselines using either static features or dynamic features.
Index Terms: emotion recognition, i-vector