Multimodal Continuous Emotion Recognition with Data Augmentation Using Recurrent Neural Networks

Jian Huang,Ya Li,Jianhua Tao,Zheng Lian,Mingyue Niu,Minghao Yang
DOI: https://doi.org/10.1145/3266302.3266304
2018-01-01
Abstract:This paper presents our effects for Cross-cultural Emotion Sub-challenge in the Audio/Visual Emotion Challenge (AVEC) 2018, whose goal is to predict the level of three emotional dimensions time-continuously in a cross-cultural setup. We extract the emotional features from audio, visual and textual modalities. The state of art regressor for continuous emotion recognition, long short term memory recurrent neural network (LSTM-RNN) is utilized. We augment the training data by replacing the original training samples with shorter overlapping samples extracted from them, thus multiplying the number of training samples and also beneficial to train emotional temporal model with LSTM-RNN. In addition, two strategies are explored to decrease the interlocutor influence to improve the performance. We also compare the performance of feature level fusion and decision level fusion. The experimental results show the efficiency of the proposed method and competitive results are obtained.
What problem does this paper attempt to address?