ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Annotators' agreement and spontaneous emotion classification performance

Bogdan Vlasenko, Andreas Wendemuth

The combination of various types of data can significantly increase the amount of emotional material for training of more reliable real-life emotion classifiers. There are two well-known schemes of annotation utilized for emotional speech: multi-dimensional and categories-based. Multi-dimensional annotation is usually applied for labeling spontaneous emotional events, and categorial-based annotation is used for specification of the acted “full blown” emotional chunks. In order to simulate real-life conditions we used a cross-corpora evaluation strategy for datasets with different schemes of emotional annotation. Emotional models were trained on acted material from the EMO-DB (categories based annotation) dataset and evaluated on spontaneous data from the VAM dataset (multi-dimensional annotation). The best emotion classification performance was obtained on real-life emotional instances with the most intense arousal labels provided by a majority voting strategy (out of 17 annotators). We find that the corresponding spontaneous speech samples containing the most intensive emotional content are comparable with acted instances. The importance of employing a larger number of emotional annotators was finally addressed in our article.