Video-based Emotion Recognition Using Aggregated Features and Spatio-temporal Information.

Jinchang Xu,Yuan Dong,Lilei Ma,Hongliang Bai
DOI: https://doi.org/10.1109/icpr.2018.8545441
2018-01-01
Abstract:In this paper, we present a video-based emotion recognition system in the wild which consists of four pipeline modules: image-processing, deep feature extraction, feature aggregation and emotion classification. Our method focuses more on different feature descriptors. To obtain high-level features which are more discriminative in emotion recognition, we employ an aggregation of features extracted from different deep convolutional neural networks (CNNs). Furthermore, the long short-term memory network (LSTM) and 3D convolutional networks (C3D) are utilized to extract spatio-temporal features from videos in order to combine the spatial information and temporal information. Additionally, we evaluate our method on the 5th Emotion Recognition in the Wild Challenge in the category of video-based emotion recognition and the result shows our proposed system achieves better performance.
What problem does this paper attempt to address?