OUS: Scene-Guided Dynamic Facial Expression Recognition

Xinji Mai,Haoran Wang,Zeng Tao,Junxiong Lin,Shaoqi Yan,Yan Wang,Jing Liu,Jiawen Yu,Xuan Tong,Yating Li,Wenqiang Zhang
2024-05-29
Abstract:Dynamic Facial Expression Recognition (DFER) is crucial for affective computing but often overlooks the impact of scene context. We have identified a significant issue in current DFER tasks: human annotators typically integrate emotions from various angles, including environmental cues and body language, whereas existing DFER methods tend to consider the scene as noise that needs to be filtered out, focusing solely on facial information. We refer to this as the Rigid Cognitive Problem. The Rigid Cognitive Problem can lead to discrepancies between the cognition of annotators and models in some samples. To align more closely with the human cognitive paradigm of emotions, we propose an Overall Understanding of the Scene DFER method (OUS). OUS effectively integrates scene and facial features, combining scene-specific emotional knowledge for DFER. Extensive experiments on the two largest datasets in the DFER field, DFEW and FERV39k, demonstrate that OUS significantly outperforms existing methods. By analyzing the Rigid Cognitive Problem, OUS successfully understands the complex relationship between scene context and emotional expression, closely aligning with human emotional understanding in real-world scenarios.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the issue in Dynamic Facial Expression Recognition (DFER) tasks where existing methods overly rely on facial information while neglecting the impact of contextual scenes, leading to poor performance in handling ambiguous emotion classification. Specifically, human annotators consider environmental cues and body language when labeling facial expressions, whereas current DFER methods typically treat scene information as noise and filter it out, focusing only on facial features. This discrepancy is referred to as the "Rigid Cognitive Problem," which causes the model's cognitive pattern to be inconsistent with that of humans, thereby affecting the accuracy of emotion recognition. To solve this problem, the paper proposes a new method—Overall Understanding of Scenes for Dynamic Facial Expression Recognition (OUS). This method effectively integrates scene and facial features, combining scene-specific emotional knowledge to improve the accuracy and robustness of emotion recognition. Experimental results show that OUS significantly outperforms existing methods on the two largest DFER datasets, DFEW and FERV39k.