FAF: A novel multimodal emotion recognition approach integrating face, body and text

Zhongyu Fang,Aoyun He,Qihui Yu,Baopeng Gao,Weiping Ding,Tong Zhang,Lei Ma
DOI: https://doi.org/10.48550/arXiv.2211.15425
2022-11-20
Abstract:Multimodal emotion analysis performed better in emotion recognition depending on more comprehensive emotional clues and multimodal emotion dataset. In this paper, we developed a large multimodal emotion dataset, named "HED" dataset, to facilitate the emotion recognition task, and accordingly propose a multimodal emotion recognition method. To promote recognition accuracy, "Feature After Feature" framework was used to explore crucial emotional information from the aligned face, body and text samples. We employ various benchmarks to evaluate the "HED" dataset and compare the performance with our method. The results show that the five classification accuracy of the proposed multimodal fusion method is about 83.75%, and the performance is improved by 1.83%, 9.38%, and 21.62% respectively compared with that of individual modalities. The complementarity between each channel is effectively used to improve the performance of emotion recognition. We had also established a multimodal online emotion prediction platform, aiming to provide free emotion prediction to more users.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the issue that single-modal emotion recognition methods often have limitations and cannot fully reflect human emotional expressions in the field of emotion recognition. Therefore, the authors propose a new multi-modal emotion recognition method aimed at improving the accuracy of emotion recognition by integrating facial expressions, body posture, and textual information. Specifically, the main contributions of the paper include: 1. **Constructing a high-quality multi-modal emotion dataset**: The authors constructed a large multi-modal emotion dataset named "HED," which includes facial, body, and textual samples aligned with five emotions (happiness, sadness, disgust, anger, and fear). 2. **Proposing a new multi-modal emotion recognition method**: The authors proposed a framework called "Feature After Feature" (FAF) for extracting key emotional information from aligned facial-body-text samples. This framework extracts image features through a residual network, uses BERT word vectors to extract textual features, and integrates multi-modal information through convolutional layers and attention mechanisms to explore high-level complementary information. 3. **Establishing an online multi-modal emotion prediction platform**: To verify the effectiveness of the method and its future application value, the authors developed an online multi-modal emotion prediction platform that provides services for both single-modal and multi-modal emotion recognition. Through these contributions, the paper aims to address issues in existing emotion recognition methods, such as low-quality datasets, inaccurate multi-modal fusion algorithms, and high computational complexity, thereby improving the accuracy and robustness of emotion recognition.