Abstract:Facial expressions can provide a better understanding of people's mental status and attitudes towards specific things. However, facial occlusion in real world is an unfavorable phenomenon that greatly affects the performance of facial expression recognition models. Recent works addressing the occlusion problem have primarily relied on attention mechanisms or occlusion discarding methods that focus on non-occluded regions of the face. However, these methods have not achieved a good balance between occlusion robustness and model efficiency. In this paper, we propose a simple and efficient model, called FERMixNet, for occluded facial expression recognition. The model incorporates a novel facial mixing augmentation strategy (FERMix) that generates new training samples by simulating real-world facial occlusion and preserving high expression-related semantic information. By co-training the original and newly generated samples, the model's occlusion robustness is improved without increasing its complexity during inference. Additionally, to further enhance the model's occlusion robustness, we include mid-level representation learning in the network to learn the discriminative non-occluded local features of the samples with low computational cost. Extensive experiments on four public facial occlusion datasets: Occlusion-RAF-DB, Occlusion-FERPlus and FED-RO show that the proposed model achieves state-of-the-art results which demonstrates the good robustness of our method for occluded facial expression recognition. Meanwhile, the proposed model also achieves state-of-the-art results on the in-the-wild facial expression datasets RAF-DB, AffectNet-8, and AffectNet-7. It proves that the proposed model has good application prospects in real world.

FFNet-M - Feature Fusion Network with Masks for Multimodal Facial Expression Recognition.

AFNet-M: Adaptive Fusion Network with Masks for 2D+3D Facial Expression Recognition.

CMANET: Curvature-Aware Soft Mask Guided Attention Fusion Network for 2D+3D Facial Expression Recognition

Multimodal 2D+3D Facial Expression Recognition with Deep Fusion Convolutional Neural Network

3-D Facial Expression Recognition via Attention-Based Multichannel Data Fusion Network

3D-FERNet: A Facial Expression Recognition Network utilizing 3D information

Automatic 4D Facial Expression Recognition via Collaborative Cross-domain Dynamic Image Network.

Towards Reading Beyond Faces for Sparsity-aware 3D/4D Affect Recognition

Landmarks-assisted Collaborative Deep Framework for Automatic 4D Facial Expression Recognition.

A Dual-Direction Attention Mixed Feature Network for Facial Expression Recognition

Multi-Net Fusion: Exploring a Brain-Inspired Neural Network Model for Facial Expression Recognition

Facial expression recognition through multi-level features extraction and fusion

Dynamic Multi-Channel Metric Network for Joint Pose-Aware and Identity-Invariant Facial Expression Recognition

A Lightweight Attention-based Deep Network via Multi-Scale Feature Fusion for Multi-View Facial Expression Recognition

Multi-Head Attention Affinity Diversity Sharing Network for Facial Expression Recognition

Feature fusion of multi-granularity and multi-scale for facial expression recognition

FERMixNet: an Occlusion Robust Facial Expression Recognition Model with Facial Mixing Augmentation and Mid-Level Representation Learning

MFFNet: Multimodal Feature Fusion Network for Point Cloud Semantic Segmentation

CF-DAN: Facial-expression recognition based on cross-fusion dual-attention network

Fusing Normal Vector and Curvature Features on the Mesh for 3D Facial Expression Recognition.