Affective Behaviour Analysis via Progressive Learning

Chen Liu,Wei Zhang,Feng Qiu,Lincheng Li,Xin Yu
2024-07-26
Abstract:Affective Behavior Analysis aims to develop emotionally intelligent technology that can recognize and respond to human emotions. To advance this, the 7th Affective Behavior Analysis in-the-wild (ABAW) competition establishes two tracks: i.e., the Multi-task Learning (MTL) Challenge and the Compound Expression (CE) challenge based on Aff-Wild2 and C-EXPR-DB datasets. In this paper, we present our methods and experimental results for the two competition tracks. Specifically, it can be summarized in the following four aspects: 1) To attain high-quality facial features, we train a Masked-Auto Encoder in a self-supervised manner. 2) We devise a temporal convergence module to capture the temporal information between video frames and explore the impact of window size and sequence length on each sub-task. 3) To facilitate the joint optimization of various sub-tasks, we explore the impact of sub-task joint training and feature fusion from individual tasks on each task performance improvement. 4) We utilize curriculum learning to transition the model from recognizing single expressions to recognizing compound expressions, thereby improving the accuracy of compound expression recognition. Extensive experiments demonstrate the superiority of our designs.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper primarily focuses on Affective Behavior Analysis, aiming to develop intelligent technologies capable of recognizing and responding to human emotions. Specifically, the paper advances this goal through the following two competition tracks: 1. **Multi-Task Learning (MTL) Challenge**: - This challenge is based on the Aff-Wild2 and C-EXPR-DB datasets and includes three sub-tasks: Action Unit (AU) prediction, Expression Recognition (EXPR), and Valence-Arousal (VA) estimation. - Researchers extract high-quality facial features through self-supervised training of a Masked-Auto Encoder (MAE) and design a temporal aggregation module to capture temporal information between video frames. 2. **Compound Expression (CE) Challenge**: - This challenge requires recognizing compound expressions with limited annotated data. - Researchers adopt a progressive learning strategy, first training a single expression recognition model, then gradually introducing compound expression data, and using CutMix and Mixup techniques to enhance data diversity. Through the above methods, researchers achieved significant performance improvements in both competition tracks and validated the effectiveness of their approach.