ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Cross-corpus open-set Speech Emotion Recognition Method Based on Spatiotemporal Features with Inverse-Entropy Regularization

ZhaoHui Zhou, Hui Luo

We propose a method to address the performance degradation due to the distribution shift and unknown emotion categories in cross-corpus open-set speech emotion recognition. The method combines spatiotemporal feature extraction and inverse-entropy regularization. First, the long-range spatiotemporal dependencies are extracted from emotional audio sequences using a deep fusion network. To further align distributions from the source and target corpora, the MMD regularization is applied to minimize the distance between their joint distributions. Moreover, we propose an inverse-entropy regularization to learn the discriminative information used to reject known classes, which can balance the classification confidence of samples from the known or unknown categories in the open-set setting, allowing the model to predict unknown classes while preventing over-prediction. Experimental results show that our method outperforms baseline models across four cross-corpus speech emotion datasets.