ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Deep Transductive Transfer Regression Network for Cross-Corpus Speech Emotion Recognition

Yan Zhao, Jincen Wang, Ru Ye, Yuan Zong, Wenming Zheng, Li Zhao

In this paper, we focus on the research of cross-corpus speech emotion recognition (SER), in which the training (source) and testing (target) speech samples come from different corpora leading to a feature distribution gap between them. To solve this problem, we propose a simple yet effective method called deep transductive transfer regression network (DTTRN). The basic idea of DTTRN is to learn a corpus invariant deep neural network to bridge the source and target speech samples and their label information. Following this idea, we make use of a transductive learning way to enforce a deep regressor to build the relationship between the features and emotional labels jointly in both speech corpora. Meanwhile, we also design an emotion guided regularization term for learning DTTRN by aligning source and target speech samples feature distributions from three different scales. Thus, the DTTRN only absorbing the label information provided by source speech samples is able to correctly predict the emotions of the target ones. To evaluate DTTRN, we conduct extensive cross-corpus SER experiments on EmoDB, CASIA, and eNTERFACE corpora. Experimental results show the superior performance of our DTTRN over recent state-of-the-art deep transfer learning methods in dealing with the cross-corpus SER tasks.