Multi-Modal Multi-Task Affective States Recognition Based on Label Encoder Fusion
Maxim Markitantov, Elena Ryumina, Heysem Kaya, Alexey Karpov
Despite recent advances in multi-modal approaches, recognizing the full range of human affective states, including emotions and sentiments, remains challenging due to complex interactions between different modalities and the hierarchical nature of affective states. This work presents a novel approach for multi-modal multi-task emotion and sentiment recognition that integrates audio, video, and text data. We introduce a Label Encoder Fusion Strategy, which produces and processes uni-modal emotion and sentiment predictions, which are used alongside modality-specific features during the fusion process to provide additional contextual information. We conduct elaborate multi-corpus experiments on the RAMAS, MELD, and CMU-MOSEI corpora. The proposed approach achieves state-of-the-art performance in both affective tasks. On MELD, we achieve a macro F1 (MF) of 40.9% and 67.02% for emotion and sentiment recognition. On CMU-MOSEI, the mean MF is 62.30% and MF is 62.00% for the same tasks.
Erratum
Figure 2 does not accurately represent proposed approach and significantly distorts the key ideas and contributions of the paper. The current illustration does not include Label Encoder with averaging despite the right caption. The corrected figure is shown here:
Figure 2: Label Encoder Fusion Strategy with Averaging.
/figcaption>
@inproceedings{markitantov25_interspeech,
title = {{Multi-Modal Multi-Task Affective States Recognition Based on Label Encoder Fusion}},
author = {Maxim Markitantov and Elena Ryumina and Heysem Kaya and Alexey Karpov},
year = {2025},
booktitle = {{Interspeech 2025}},
pages = {3010--3014},
doi = {10.21437/Interspeech.2025-2060},
issn = {2958-1796},
}
Cite as: Markitantov, M., Ryumina, E., Kaya, H., Karpov, A. (2025) Multi-Modal Multi-Task Affective States Recognition Based on Label Encoder Fusion. Proc. Interspeech 2025, 3010-3014, doi: 10.21437/Interspeech.2025-2060