ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

SAMSEMO: New dataset for multilingual and multimodal emotion recognition

Pawel Bujnowski, Bartlomiej Kuzma, Bartlomiej Paziewski, Jacek Rutkowski, Joanna Marhula, Zuzanna Bordzicka, Piotr Andruszkiewicz

The task of emotion recognition using image, audio and text modalities has recently attained popularity due to its various potential applications. However, the list of large-scale multimodal datasets is very short and all available datasets have significant limitations. We present SAMSEMO, a novel dataset for multimodal and multilingual emotion recognition. Our collection of over 23k video scenes is multilingual as it includes video scenes in 5 languages (EN, DE, ES, PL and KO). Video scenes are heterogeneous, they come from diverse sources and are accompanied with rich manually collected metadata and emotion annotations. In the paper, we also study the valence and arousal of audio features of our data for the most important emotion classes and compare them with the features of CMU-MOSEI data. Moreover, we perform multimodal experiments for emotion recognition with SAMSEMO and show how to use a multilingual model to improve the detection of imbalanced classes.