ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

An Inter-Speaker Fairness-Aware Speech Emotion Regression Framework

Hsing-Hang Chou, Woan-Shiuan Chien, Ya-Tse Wu, Chi-Chun Lee

Speech emotion recognition (SER) helps to achieve better human-to-machine interactions in voice technologies. Recent studies have pointed out critical fairness issues in the SER. While there are efforts in building fair SER, most of the works focus on fairness between demographic groups and rely on these broad categorical attributes to build a fair SER. In this paper, we instead focus on the fairness learning among individual speakers, which is rarely discussed yet much more intuitively appealing in constructing a fair SER model. To reduce the reliance on knowing speaker IDs, we perform unsupervised clustering on the utterance embeddings from a pretrained speaker verification model that puts utterances with different characteristics into clusters that roughly represent the true speaker index. Our evaluation demonstrates that with these cluster IDs, we can construct a fairness-aware SER model at an individual speaker-level without knowing speaker IDs upfront.