ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Boosting Cross-Corpus Speech Emotion Recognition using CycleGAN with Contrastive Learning

Jincen Wang, Yan Zhao, Cheng Lu, Chuangao Tang, Sunan Li, Yuan Zong, Wenming Zheng

The premise for the success of most classic speech emotion recognition (SER) algorithms is that training and testing samples are independent and identically distributed. However, the premise is not always valid in real life. Thus, in this paper, we propose a novel transfer learning method called contrastive cycle generative adversarial network (C2GAN) to address cross-corpus SER, where training and testing data originates from different corpora. Specifically, we first adapt CycleGAN to generate synthetic data, transforming samples between source and target corpora, to enhance the variability of source data. Then, an emotion-guided contrastive learning module is introduced to jointly optimize original and synthetic data during training, leading to better class-level feature alignment. We conduct experiments on eNTERFACE, CASIA and EmoDB datasets with six different settings for evaluation. Extensive results confirm the excellent performance of C2GAN over other state-of-the-art methods.