Abstract:Current works on facial action unit (AU) recognition typically require fully AU-labeled training samples. To reduce the reliance on time-consuming manual AU annotations, we propose a novel semi-supervised AU recognition method leveraging two kinds of readily available auxiliary information. The method leverages the dependencies between AUs and expressions as well as the dependencies among AUs, which are caused by facial anatomy and therefore embedded in all facial images, independent on their AU annotation status. The other auxiliary information is facial image synthesis given AUs, the dual task of AU recognition from facial images, and therefore has intrinsic probabilistic connections with AU recognition, regardless of AU annotations. Specifically, we propose a dual semi-supervised generative adversarial network for AU recognition from partially AU-labeled and fully expression-labeled facial images. The proposed network consists of an AU classifier C, an image generator G , and a discriminator D. In addition to minimize the supervised losses of the AU classifier and the face generator for labeled training data, we explore the probabilistic duality between the tasks using adversary learning to force the convergence of the face-AU-expression tuples generated from the AU classifier and the face generator, and the ground-truth distribution in labeled data for all training data. This joint distribution also includes the inherent AU dependencies. Furthermore, we reconstruct the facial image using the output of the AU classifier as the input of the face generator, and create AU labels by feeding the output of the face generator to the AU classifier. We minimize reconstruction losses for all training data, thus exploiting the informative feedback provided by the dual tasks. Within-database and cross-database experiments on three benchmark databases demonstrate the superiority of our method in both AU recognition and face synthesis compared to state-of-the-art works.

Pursuing Knowledge Consistency: Supervised Hierarchical Contrastive Learning for Facial Action Unit Recognition

Learning Contrastive Feature Representations for Facial Action Unit Detection

Knowledge-Driven Self-Supervised Representation Learning for Facial Action Unit Recognition

Contrastive Learning of Person-independent Representations for Facial Action Unit Detection

Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition

Facial Action Unit Representation based on Self-supervised Learning with Ensembled Priori Constraints

Weakly-Supervised Text-driven Contrastive Learning for Facial Behavior Understanding

Facial Action Unit Detection Using Attention and Relation Learning

AU-Expression Knowledge Constrained Representation Learning for Facial Expression Recognition

Pose-disentangled Contrastive Learning for Self-supervised Facial Representation

Facial Action Units Detection Aided by Global-Local Expression Embedding

Exploring Adversarial Learning for Deep Semi-Supervised Facial Action Unit Recognition

Weakly Supervised Facial Action Unit Recognition with Domain Knowledge

Multi-scale Promoted Self-adjusting Correlation Learning for Facial Action Unit Detection

Weakly Supervised Regional and Temporal Learning for Facial Action Unit Recognition

UCoL: Unsupervised Learning of Discriminative Facial Representations via Uncertainty-Aware Contrast

Exploring Domain Knowledge for Facial Expression-Assisted Action Unit Activation Recognition.

Dual Semi-Supervised Learning for Facial Action Unit Recognition.

Facial Action Unit Classification with Hidden Knowledge under Incomplete Annotation

Facial Action Unit Recognition and Intensity Estimation Enhanced Through Label Dependencies