Affective Behaviour Analysis via Integrating Multi-Modal Knowledge

Wei Zhang,Feng Qiu,Chen Liu,Lincheng Li,Heming Du,Tiancheng Guo,Xin Yu

2024-03-16

Abstract:Affective Behavior Analysis aims to facilitate technology emotionally smart, creating a world where devices can understand and react to our emotions as humans do. To comprehensively evaluate the authenticity and applicability of emotional behavior analysis techniques in natural environments, the 6th competition on Affective Behavior Analysis in-the-wild (ABAW) utilizes the Aff-Wild2, Hume-Vidmimic2, and C-EXPR-DB datasets to set up five competitive tracks, i.e., Valence-Arousal (VA) Estimation, Expression (EXPR) Recognition, Action Unit (AU) Detection, Compound Expression (CE) Recognition, and Emotional Mimicry Intensity (EMI) Estimation. In this paper, we present our method designs for the five tasks. Specifically, our design mainly includes three aspects: 1) Utilizing a transformer-based feature fusion module to fully integrate emotional information provided by audio signals, visual images, and transcripts, offering high-quality expression features for the downstream tasks. 2) To achieve high-quality facial feature representations, we employ Masked-Auto Encoder as the visual features extraction model and fine-tune it with our facial dataset. 3) Considering the complexity of the video collection scenes, we conduct a more detailed dataset division based on scene characteristics and train the classifier for each scene. Extensive experiments demonstrate the superiority of our designs.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address multiple issues in emotional behavior analysis, specifically including the following aspects: 1. **Multimodal Emotion Information Fusion**: By integrating emotional information from multimodal data such as audio signals, visual images, and text descriptions, to improve the accuracy of emotion analysis tasks. 2. **High-Quality Facial Feature Extraction**: Utilizing large-scale facial image datasets and the self-supervised model Masked Auto Encoder (MAE) to learn deep feature representations, thereby enhancing the performance of downstream tasks. 3. **Model Generalization Ability in Complex Backgrounds**: Adopting an ensemble learning strategy, dividing the dataset into multiple subsets based on different scene characteristics, and training classifiers for each subset. Finally, combining the output results of each classifier through a voting mechanism to enhance the model's generalization ability in different environments. In summary, the main goal of this paper is to comprehensively evaluate the authenticity and applicability of emotional behavior analysis techniques in natural environments and propose a method design that can effectively handle multimodal data and maintain good performance in complex backgrounds.

Affective Behaviour Analysis via Integrating Multi-Modal Knowledge

An Effective Ensemble Learning Framework for Affective Behaviour Analysis

Affective Behaviour Analysis via Progressive Learning

A Efficient Multimodal Framework for Large Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

Transformer-based Multimodal Information Fusion for Facial Expression Analysis

Multi-modal Facial Affective Analysis based on Masked Autoencoder

Facial Affect Recognition based on Transformer Encoder and Audiovisual Fusion for the ABAW5 Challenge

Facial Affective Behavior Analysis Method for 5th ABAW Competition

Multimodal Feature Extraction and Fusion for Emotional Reaction Intensity Estimation and Expression Classification in Videos with Transformers

A Multi-term and Multi-task Analyzing Framework for Affective Analysis in-the-wild

Facial Expression Recognition Based on Multi-modal Features for Videos in the Wild

Prior Aided Streaming Network for Multi-task Affective Recognitionat the 2nd ABAW2 Competition

Multi-Task Learning for Emotion Descriptors Estimation at the fourth ABAW Challenge

Multi-Task Learning Framework for Emotion Recognition In-the-Wild

Spatial-temporal Transformer for Affective Behavior Analysis

ABAW: Valence-Arousal Estimation, Expression Recognition, Action Unit Detection & Emotional Reaction Intensity Estimation Challenges

Facial Affect Analysis: Learning from Synthetic Data & Multi-Task Learning Challenges

A Unified Approach to Facial Affect Analysis: the MAE-Face Visual Representation.

The 6th Affective Behavior Analysis in-the-wild (ABAW) Competition

Mutilmodal Feature Extraction and Attention-based Fusion for Emotion Estimation in Videos

Affective Expression Analysis in-the-wild using Multi-Task Temporal Statistical Deep Learning Model