Video Emotion Recognition using Hand-Crafted and Deep Learning Features
Xiaohan Xia,Jiamu Liu,Tao Yang,Dongmei Jiang,Wenjing Han,Hichem Sahli
DOI: https://doi.org/10.1109/ACIIAsia.2018.8470326
2018-01-01
Abstract:In this paper, we present our system designed for the video emotion recognition task of the Multimodal Emotion Challenge (MEC 2017). Histogram of Oriented Gradients (HOG), face shape (SHAPE), and geometric (GEO) features are extracted from the detected face images as hand-crafted video features. A pre-trained VGG-Face model is fine-tuned with the face images and emotion labels from the training set of CHEAVD 2.0, the outputs of the penultimate fully-connected layer (FC6) and the last fully-connected layer (FC7) are adopted as Deep Convolutional Neural Network (DCNN) based features. For each video clip, the hand-crafted features and DCNN based features are input into corresponding hidden Markov models (HMMs, one for each emotion class), respectively, for the initial emotion recognitions. The output logarithm likelihood probabilities from the HMMs are then ranked, and the orders constitute an eight-dimensional feature vector as inputs to a Naive Bayes classifier for decision fusion. Experimental results on the CHEAVD 2.0 database show that the combination of FC6, GEO, SHAPE and HOG features obtains the highest macro average precisions (MAPs) on both the validation set (46.61%) and test set (43.88%), which are 12.51% and 22.18% higher than the baseline results, respectively.
What problem does this paper attempt to address?