ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Multi-task learning for speech emotion recognition in naturalistic conditions

Bartłomiej Zgórzyński, Juliusz Wójtowicz-Kruk, Piotr Masztalski, Władysław Średniawa

This work introduces a multi-encoder joint classification and regression training framework for speech emotion recognition. We present our solution for the Interspeech 2025 Speech Emotion Recognition in Naturalistic Conditions Challenge, leveraging a multi-modal, multi-encoder architecture with a fusion module. Our results demonstrate the effectiveness of the multi-task approach for both classification and regression tasks, achieving a top 10 spot in categorical emotion classification and 2nd place in emotional attribute prediction among competing teams. Furthermore, an ablation study shows that employing multi-task learning outperforms separate task-specific training. These findings highlight the potential of multi-task, multi-encoder systems for speech emotion recognition.