The combination of various types of data can significantly increase the amount of emotional material for training of more reliable real-life emotion classifiers. There are two well-known schemes of annotation utilized for emotional speech: multi-dimensional and categories-based. Multi-dimensional annotation is usually applied for labeling spontaneous emotional events, and categorial-based annotation is used for specification of the acted “full blown” emotional chunks. In order to simulate real-life conditions we used a cross-corpora evaluation strategy for datasets with different schemes of emotional annotation. Emotional models were trained on acted material from the EMO-DB (categories based annotation) dataset and evaluated on spontaneous data from the VAM dataset (multi-dimensional annotation). The best emotion classification performance was obtained on real-life emotional instances with the most intense arousal labels provided by a majority voting strategy (out of 17 annotators). We find that the corresponding spontaneous speech samples containing the most intensive emotional content are comparable with acted instances. The importance of employing a larger number of emotional annotators was finally addressed in our article.