ISCA Archive Interspeech 2010
ISCA Archive Interspeech 2010

Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling

Martin Wöllmer, Angeliki Metallinou, Florian Eyben, Björn Schuller, Shrikanth S. Narayanan

In this paper, we apply a context-sensitive technique for multimodal emotion recognition based on feature-level fusion of acoustic and visual cues. We use bidirectional Long Short-Term Memory (BLSTM) networks which, unlike most other emotion recognition approaches, exploit long-range contextual information for modeling the evolution of emotion within a conversation. We focus on recognizing dimensional emotional labels, which enables us to classify both prototypical and non-prototypical emotional expressions contained in a large audio-visual database. Subject-independent experiments on various classification tasks reveal that the BLSTM network approach generally prevails over standard classification techniques such as Hidden Markov Models or Support Vector Machines, and achieves F1-measures of the order of 72%, 65%, and 55% for the discrimination of three clusters in emotional space and the distinction between three levels of valence and activation, respectively.