Abstract:Facial expression recognition (FER) in the wild is an exceedingly challenging task in computer vision due to subtle differences, poses, occlusions, label bias, and other uncontrollable factors. CNN-based deep learning networks are susceptible to the above factors, resulting in the inability to obtain highly discriminative features on the key regions of expressions, and most methods of learning in a single feature space may not fully capture the core regions of interest. These will directly affect the solution to the problem of intra-class variability and inter-class similarity of expressions, which ultimately affects the recognition performance. Therefore, we propose an effective multi-head parallel channel-spatial attention network (MPCSAN) for FER in the wild, which consists of a feature aggregation network (FAN), a multi-head parallel attention network (MPAN), and an expression forecasting network (EFN). First, the lightweight FAN network extracts basic expression features while optimizing intra-class and inter-class distribution. Then, MPAN forms a multi-attention subspace by a multi-head parallel channel-space attention fusion design and focuses on more accurate and comprehensive expression regions of interest by minimizing duplicate attention during subspace fusion. Finally, EFN performs the final expression classification under the optimization of label softening, which further improves the robustness problem caused by label bias. Our proposed method is evaluated on the three most widely used wild expression datasets (RAF-DB, FERPlus, and AffectNet). The extensive experimental results demonstrate that our method outperforms several current state-of-the-art methods, achieving accuracies of 90.16% on RAF-DB, 89.91% on FERPlus, and 61.58% on AffectNet, respectively. Occlusion and pose variation datasets evaluation and cross-dataset assessment further demonstrate the good comprehensive performance of our method.

Deep Global Multiple-Scale and Local Spatial-Channel Attention Dual-Branch Network for Pose-Invariant Facial Expression Recognition

Deep Spatial and Channel Sliding Attention Patches for Pose-invariant Facial Expression Recognition

Soft Thresholding Squeeze-and-excitation Network for Pose-Invariant Facial Expression Recognition

Joint spatial and scale attention network for multi-view facial expression recognition

Using Attention Lsgb Network for Facial Expression Recognition

Faceknow: Facial Expression Recognition by a Global-Local Network with a Sub-Images-Related Contextual Attention Mechanism

Dynamic Multi-Channel Metric Network for Joint Pose-Aware and Identity-Invariant Facial Expression Recognition

Two-pathway attention network for real-time facial expression recognition

Multi-Attention Module for Dynamic Facial Emotion Recognition

Facial Attention based Convolutional Neural Network for 2D+3D Facial Expression Recognition

A Lightweight Attention-based Deep Network via Multi-Scale Feature Fusion for Multi-View Facial Expression Recognition

Pose-adaptive Hierarchical Attention Network for Facial Expression Recognition

Two-stream Global-Guided Attention Network for Facial Expression Recognition

MPCSAN: Multi-Head Parallel Channel-Spatial Attention Network for Facial Expression Recognition in the Wild

Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition

A Multi-Scale Feature Fusion Network for Facial Expression Recognition

A convolution-transformer dual branch network for head-pose and occlusion facial expression recognition

Geometry Guided Pose-Invariant Facial Expression Recognition

A Fine-Grained Facial Expression Database for End-to-End Multi-Pose Facial Expression Recognition

Expression Recognition Based on Multi-feature Fusion of Peak-neutral Differences

Multi-level Spatial and Semantic Enhancement Network for Expression Recognition