ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Label Semantic-Driven Contrastive Learning for Speech Emotion Recognition

Jiaxi Hu, Leyuan Qu, Haoxun Li, Taihao Li

Speech Emotion Recognition (SER) is crucial for human-computer interaction applications. However, SER remains a challenging task due to limited datasets and ambiguous emotion boundaries. While Self-Supervised Learning (SSL) models have demonstrated considerable success in speech processing tasks, existing approaches still struggle to distinguish subtle emotional variations. In this paper, we propose a novel Label Semantic-driven Contrastive Learning framework (LaSCL) that integrates emotion label semantic embeddings into speech representation learning. Our method uses label embeddings as semantic anchors to explicitly model relationships between emotions and employ a label divergence loss to better establish clearer emotion boundaries. Experiments on the widely used IEMOCAP benchmark indicate that LaSCL achieves state-of-the-art performance compared with previous methods.