ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Emotion Prompting for Speech Emotion Recognition

Xingfa Zhou, Min Li, Lan Yang, Rui Sun, Xin Wang, Huayi Zhan

Speech Emotion Recognition (SER) classifies speech into emotion categories such as: Happy, Angry. Most prior works for SER focused on how to mine compelling features to improve performance. However, these methods ignore the influence of emotional label information on SER. Recent studies have attempted to prompt pre-trained language models and yield good performance for NLP tasks. Nevertheless, few works have attempted to prompt pre-trained speech models (PSM) on speech tasks. In light of these, we propose a simple but effective prompt-based method that prompts PSM for SER. Firstly, we reframe SER as an entailment task. Next, we generate speech prompts and combine them with the raw audio to form the input for PSM. Finally, we build a multi-task learning framework to extract more compelling features by simultaneously performing automatic speech recognition (ASR) and SER. Experiments on the IEMOCAP benchmark show that our method outperforms state-of-the-art baselines on the SER task.