FAF: A novel multimodal emotion recognition approach integrating face, body and text

Zhongyu Fang,Aoyun He,Qihui Yu,Baopeng Gao,Weiping Ding,Tong Zhang,Lei Ma

DOI: https://doi.org/10.48550/arXiv.2211.15425

2022-11-20

Abstract:Multimodal emotion analysis performed better in emotion recognition depending on more comprehensive emotional clues and multimodal emotion dataset. In this paper, we developed a large multimodal emotion dataset, named "HED" dataset, to facilitate the emotion recognition task, and accordingly propose a multimodal emotion recognition method. To promote recognition accuracy, "Feature After Feature" framework was used to explore crucial emotional information from the aligned face, body and text samples. We employ various benchmarks to evaluate the "HED" dataset and compare the performance with our method. The results show that the five classification accuracy of the proposed multimodal fusion method is about 83.75%, and the performance is improved by 1.83%, 9.38%, and 21.62% respectively compared with that of individual modalities. The complementarity between each channel is effectively used to improve the performance of emotion recognition. We had also established a multimodal online emotion prediction platform, aiming to provide free emotion prediction to more users.

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

The paper attempts to address the issue that single-modal emotion recognition methods often have limitations and cannot fully reflect human emotional expressions in the field of emotion recognition. Therefore, the authors propose a new multi-modal emotion recognition method aimed at improving the accuracy of emotion recognition by integrating facial expressions, body posture, and textual information. Specifically, the main contributions of the paper include: 1. **Constructing a high-quality multi-modal emotion dataset**: The authors constructed a large multi-modal emotion dataset named "HED," which includes facial, body, and textual samples aligned with five emotions (happiness, sadness, disgust, anger, and fear). 2. **Proposing a new multi-modal emotion recognition method**: The authors proposed a framework called "Feature After Feature" (FAF) for extracting key emotional information from aligned facial-body-text samples. This framework extracts image features through a residual network, uses BERT word vectors to extract textual features, and integrates multi-modal information through convolutional layers and attention mechanisms to explore high-level complementary information. 3. **Establishing an online multi-modal emotion prediction platform**: To verify the effectiveness of the method and its future application value, the authors developed an online multi-modal emotion prediction platform that provides services for both single-modal and multi-modal emotion recognition. Through these contributions, the paper aims to address issues in existing emotion recognition methods, such as low-quality datasets, inaccurate multi-modal fusion algorithms, and high computational complexity, thereby improving the accuracy and robustness of emotion recognition.

FAF: A novel multimodal emotion recognition approach integrating face, body and text

Emotion Recognition in Videos via Fusing Multimodal Features.

A Efficient Multimodal Framework for Large Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

A Novel Emotion-Aware Method Based on the Fusion of Textual Description of Speech, Body Movements, and Facial Expressions

Multimodal Emotion Recognition Based on Feature Selection and Extreme Learning Machine in Video Clips.

Multimodal Emotion Recognition Based on Facial Expressions, Speech, and Body Gestures

Multimodal Emotion Recognition Based on Feature Fusion.

Multimodal Emotion Recognition Based on Cascaded Multichannel and Hierarchical Fusion

Combining Multimodal Features Within A Fusion Network For Emotion Recognition In The Wild

MF-Net: a multimodal fusion network for emotion recognition based on multiple physiological signals

A novel feature fusion network for multimodal emotion recognition from EEG and eye movement signals

Multi-head attention fusion networks for multi-modal speech emotion recognition

Multi-modal fusion network with complementarity and importance for emotion recognition

A Novel Supervised Bimodal Emotion Recognition Approach Based on Facial Expression and Body Gesture.

Emotion Recognition from Multiple Physiological Signals Using Intra- and Inter-Modality Attention Fusion Network

User Emotion Recognition Method Based on Facial Expression and Speech Signal Fusion

Multimodal Facial Expression Recognition Based on Dempster-Shafer Theory Fusion Strategy

E-MFNN: an emotion-multimodal fusion neural network framework for emotion recognition

Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning

Multimodal emotion recognition from facial expression and speech based on feature fusion

Investigation of Multimodal Features, Classifiers and Fusion Methods for Emotion Recognition