Abstract:We introduce a novel automatic detection method for facial action units (AUs) that leverages both spatial and temporal data, enhancing accuracy and robustness in expression analysis and facial animation. Our approach utilizes a Temporal feature Combination and Feature Reassignment (TC&FR) module to transform and fuse features across multiple subjects and temporal sequences. Moreover, by integrating a Regional Attention (RA) encoder and a transformer model, our method refines the extraction and processing of regional features, ensuring more precise identification and analysis of AUs. This integration not only harnesses identity‐independent features but also maximizes the temporal context, significantly improving the reliability of AU predictions. Facial action units (AUs) encode the activations of facial muscle groups, playing a crucial role in expression analysis and facial animation. However, current deep learning AU detection methods primarily focus on single‐image analysis, which limits the exploitation of rich temporal context for robust outcomes. Moreover, the scale of available datasets remains limited, leading models trained on these datasets to tend to suffer from overfitting issues. This paper proposes a novel AU detection method integrating spatial and temporal data with inter‐subject feature reassignment for accurate and robust AU predictions. Our method first extracts regional features from facial images. Then, to effectively capture both the temporal context and identity‐independent features, we introduce a temporal feature combination and feature reassignment (TC&FR) module, which transforms single‐image features into a cohesive temporal sequence and fuses features across multiple subjects. This transformation encourages the model to utilize identity‐independent features and temporal context, thus ensuring robust prediction outcomes. Experimental results demonstrate the enhancements brought by the proposed modules and the state‐of‐the‐art (SOTA) results achieved by our method.

Cascade of Tasks for Facial Expression Analysis

Facial Action Units Detection Aided by Global-Local Expression Embedding

A Robust Real-Time System for Multi-Intensity AU Detection in Facial Expression Recognition

Detecting Facial Action Units from Global-Local Fine-grained Expressions

Facial action units detection using temporal context and feature reassignment

Dynamic Cascades with Bidirectional Bootstrapping for Action Unit Detection in Spontaneous Facial Behavior

EventFormer: AU Event Transformer for Facial Action Unit Event Detection

Relative Facial Action Unit Detection

Facial Expression Recognition Based on Facial Action Unit

Facial Action Units Recognition: A Comparative Study

Self-Supervised Regional and Temporal Auxiliary Tasks for Facial Action Unit Recognition

Learning facial expression-aware global-to-local representation for robust action unit detection

Global-to-local Expression-aware Embeddings for Facial Action Unit Detection

Fully automatic recognition of the temporal phases of facial actions

Expression-assisted Facial Action Unit Recognition under Incomplete AU Annotation.

AUPro - Multi-label Facial Action Unit Proposal Generation for Sequence-Level Analysis.

SAANet: Siamese Action-Units Attention Network for Improving Dynamic Facial Expression Recognition

Graph-Based Multi-Modal Multi-View Fusion for Facial Action Unit Recognition

Fully Automatic Upper Facial Action Recognition.

AU-Expression Knowledge Constrained Representation Learning for Facial Expression Recognition

Joint Patch And Multi-Label Learning For Facial Action Unit Detection