ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Fine-tuned RoBERTa Model with a CNN-LSTM Network for Conversational Emotion Recognition

Jiachen Luo, Huy Phan, Joshua Reiss

Textual emotion recognition in conversations has gained increasing attention in recent years for the growing amount of applications it can serve, e.g., human-robot interactions, recommended systems. However, most existing approaches are either based on BERT-based model which fail to exploit crucial information about the long-text context, or resort to complex entanglement of neural network architectures resulting in less stable training procedures and slower inference time. To bridge this gap, we first propose a fast, compact and parameter-efficient framework based on fine-tuned pre-trained RoBERTa model with a CNN-LSTM network for textual emotion recognition in conversations. First, we fine-tune the pre-tranined RoBERTa model to effectively learn long-term emotion-relevant context information. Second, convolutional neural network coupled with the bidirectional long short-term memory and joint reinforced blocks are utilized to recognize emotion in conversations. Extensive experiments are conducted on benchmark emotion MELD dataset, and the results show that our model outperforms a wide range of strong baselines and achieves competitive results with the state-of-art approaches.