ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

LoRA-MER: Low-Rank Adaptation of Pre-Trained Speech Models for Multimodal Emotion Recognition Using Mutual Information

Yunrui Cai, Zhiyong Wu, Jia Jia, Helen Meng

Multimodal emotion recognition (MER) is crucial for machines to understand human intentions. Although many deep learning models have been proposed, MER still faces practical challenges. The key challenge is how to extract high-dimensional features that are more relevant to emotions. Another challenge is how to effectively model multimodal features, achieving a balance between similarity and diversity. In this paper, we propose the method of LoRA-MER using mutual information. We fine-tune a pre-trained speech model with Low-Rank Adaptation (LoRA) strategy and utilize a frozen pre-trained text model to robustly extract emotional features. Additionally, we adopt a multimodal fusion approach based on Mutual Information Neural Estimation (MINE) to enhance their correlation. Experimental results demonstrate the effectiveness of each module proposed in our method, and the performance of our model surpasses that of state-of-the-art speaker-independent approaches on IEMOCAP dataset.