Abstract:Facial expression recognition (FER) holds significant practical implications in real-world scenarios such as human–computer interaction, fatigue driving detection, and learning engagement analysis. Nonetheless, acquiring large-scale and high-quality annotated facial expression datasets is profoundly challenging due to the inherent ambiguity of facial images and concerns over privacy. Consequently, this paper introduces a self-supervised facial expression recognition method based on mask image modeling. This method can learn multi-level facial feature representations without expensive labels and achieves commendable facial expression recognition performance through further fine-grained feature selection. Specifically, we propose the multi-level feature selector (MFS). The MFS comprises two pivotal components: the multi-level feature combiner and the feature selector. During the pre-training stage, the multi-level feature combiner is employed to integrate multi-level features, effectively addressing the vision transformer's deficiencies in capturing high-frequency facial semantics. Subsequently, in the fine-tuning stage, the feature selector can automatically differentiate highly discriminative regions, extracting fine-grained features. Subsequently, we use graph convolutional networks to further mine the latent connections among fine-grained features, ultimately deriving an integrated feature with enhanced discriminative capabilities. Through such fine-grained facial feature selection, we can mitigate performance degradation induced by inter-class similarities and intra-class variations. Experimental results on the RAF-DB, AffectNet, and FER + datasets demonstrate that our approach significantly outperforms other self-supervised methods in recognition performance and closely approaches the state-of-the-art methods in supervised learning. The code is available at https://github.com/Greysahy/MFS

Progressive Self-supervised Representation Learning for 3D Facial Expression Recognition

DR-FER: Discriminative and Robust Representation Learning for Facial Expression Recognition

Cgan Based Facial Expression Recognition for Human-Robot Interaction

Deep Representation of Facial Geometric and Photometric Attributes for Automatic 3D Facial Expression Recognition

Towards Reading Beyond Faces for Sparsity-aware 3D/4D Affect Recognition

Landmarks-assisted Collaborative Deep Framework for Automatic 4D Facial Expression Recognition.

Automatic 4D Facial Expression Recognition via Collaborative Cross-domain Dynamic Image Network.

Efficient Facial Expression Recognition with Representation Reinforcement Network and Transfer Self-Training for Human–Machine Interaction

A Generative Framework for Self-Supervised Facial Representation Learning

Exploring Facial Expression Recognition through Semi-Supervised Pretraining and Temporal Modeling

Accurate Facial Parts Localization and Deep Learning for 3D Facial Expression Recognition

Towards Reading Beyond Faces for Sparsity-Aware 4D Affect Recognition

Automatic facial expression recognition on a single 3D face by exploring shape deformation.

2D+3D Facial Expression Recognition via Discriminative Dynamic Range Enhancement and Multi-Scale Learning

Bridging the Gaps: Utilizing Unlabeled Face Recognition Datasets to Boost Semi-Supervised Facial Expression Recognition

A Fine-Grained Facial Expression Database for End-to-End Multi-Pose Facial Expression Recognition

Self-supervised facial expression recognition with fine-grained feature selection

Facial Expression Recognition with Geometric Scattering on 3D Point Clouds

3D-FERNet: A Facial Expression Recognition Network utilizing 3D information

Automatic 2.5-D Facial Landmarking and Emotion Annotation for Social Interaction Assistance

Joint Deep Learning of Facial Expression Synthesis and Recognition