C-GCN: Correlation Based Graph Convolutional Network for Audio-Video Emotion Recognition.

Weizhi Nie,Minjie Ren,Jie Nie,Sicheng Zhao
DOI: https://doi.org/10.1109/tmm.2020.3032037
IF: 7.3
2020-01-01
IEEE Transactions on Multimedia
Abstract:With the development of both hardware and deep neural network technologies, tremendous improvements have been achieved in the performance of automatic emotion recognition (AER) based on the video data. However, AER is still a challenging task due to subtle expression, abstract concept of emotion and the representation of multi-modal information. Most proposed approaches focus on the multi-modal feature learning and fusion strategy, which pay more attention to the characteristic of a single video and ignore the correlation among the videos. To explore this correlation, in this paper, we propose a novel correlation-based graph convolutional network (C-GCN) for AER, which can comprehensively consider the correlation of the intra-class and inter-class videos for feature learning and information fusion. More specifically, we introduce the graph model to represent the correlation among the videos. This correlated information can help to improve the discrimination of node features in the progress of graph convolutional network. Meanwhile, the multi-head attention mechanism is applied to predict the hidden relationship among the videos, which can strengthen the inter-class correlation to improve the performance of classifiers. The C-GCN is evaluated on the AFEW datasets and eNTERFACE 05 dataset. The final experimental results demonstrate the superiority of our proposed method over the state-of-the-art methods.
What problem does this paper attempt to address?