ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Confidence-aware Hypothesis Transfer Networks for Source-Free Cross-Corpus Speech Emotion Recognition

Jincen Wang, Yan Zhao, Cheng Lu, Hailun Lian, Hongli Chang, Yuan Zong, Wenming Zheng

The goal of Source-free cross-corpus speech emotion recognition (SER) is to transfer emotion knowledge from source corpus to target one without access to source data. To address this challenge, we develop a novel method named Confidence-aware Hypothesis Transfer Network (CaHTN) including two modules. To be specific, the first module called hypothesis implicit transfer leverages the frozen source classifier (hypothesis) to force target samples to implicitly align the source hypothesis by information maximization. Besides, a bidirectional confident self-training module is designed to exploit not only the positive pseudo label information but also the negative ones for target feature extraction enhancement. To verify its effectiveness, we design twelve source-free cross-corpus SER tasks and conduct extensive experiments on CASIA, EmoDB, EMOVO and eNTERFACE. Experimental results indicate CaHTN obtains state-of-the-art performance in addressing source-free cross-corpus SER.