ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Emotion Awareness in Multi-utterance Turn for Improving Emotion Prediction in Multi-Speaker Conversation

Xiaohan Shi, Xingfeng Li, Tomoki Toda

The aim of emotion prediction in conversation (EPC) is to predict the future emotional state of a speaker based on context information, which is essential for conducting a friendly human-computer conversation. Most EPC works only investigated context information by merging a speaker's multiple utterances into a single utterance per turn and focused on conversations in a dual-speaker scenario, which ignored the information in multi-utterance turn and a more complex and natural scenario of multi-speaker conversations. This paper introduces a context information modeling approach that considers potential emotional interactive information within a speaker's multi-utterance turn, which dominates his/her future emotions. Moreover, our approach advances emotion prediction in both dual- and multi-speaker conversations. Experimental results show that such an approach significantly enhances context information modeling and renders a higher accuracy in EPC than reported in the literature.