ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Graph Isomorphism Network for Speech Emotion Recognition

Jiawang Liu, Haoxiang Wang

Previous deep learning approaches such as Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) have been broadly used in speech emotion recognition (SER). In these approaches, speech signals are generally modeled in the Euclidean space. In this paper, a novel SER model (LSTM-GIN) is proposed, which applies Graph Isomorphism Network (GIN) on LSTM outputs for global emotion modeling in the non-Euclidean space. In our LSTM-GIN model, speech signals are represented as graph-structured data so that we can better extract global feature representation. The deep frame-level features generated from the bidirectional LSTM are converted into an undirected graph with nodes represented by frame-level features and connections defined according to temporal relations between speech frames. GIN is adopted to classify the graph representations of utterances, as it is proved of excellent discriminative power in comparative experiments. We conduct experiments on the IEMOCAP dataset, and the results show that our proposed LSTM-GIN model surpasses other recent graph-based models and deep learning models by achieving 64.65% of weighted accuracy (WA) and 65.53% of unweighted accuracy (UA).