ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

SOT: Self-supervised Learning-Assisted Optimal Transport for Unsupervised Adaptive Speech Emotion Recognition

Ruiteng Zhang, Jianguo Wei, Xugang Lu, Yongwei Li, Junhai Xu, Di Jin, Jianhua Tao

In cross-domain speech emotion recognition (SER), reducing the global probability distribution distance (GPDD) between different domains plays a crucial role in unsupervised domain adaptation (UDA), which can be naturally measured by optimal transport (OT). However, owing to the large intra-variations of emotion categories, samples distributed in overlap may induce negative transports. Moreover, OT only considers the GPDD and therefore cannot efficiently transport hard-discriminative samples without utilizing the local structures from intra-class distributions. We propose a self-supervised learning (SSL)-assisted optimal transport (SOT) algorithm for cross-domain SER. First, we regularized OT's transport coupling to mitigate negative transports; then, we designed an SSL module to emphasize local intra-class structure to assist OT in capturing those nontransferable acknowledge. Cross-domain SER experimental results showed that SOT dramatically outperformed state-of-the-art UDAs.