Abstract:This paper proposes a regional adaptive correlation network (RACN) to explore more effective description of structural information of faces and enrich the expression feature representation. The network consists of two branches. The proposed second‐order regional correlation network (SRCN) is composed of autocorrelation matrix calculation modules and network layers, and the obtained regional correlation features are united with the global features extracted from the parallel branch; finally, expression classification is performed after assigning weights to the features of both branches through a channel attention mechanism. To address the problem that the features extracted by CNN‐based facial expression recognition (FER) do not consider structural information, a region adaptive correlation deep network (RACN) is proposed. The network consists of two branches. In one branch, the features obtained by applying CNN to facial sub‐blocks are used as the input of the proposed second‐order region correlation network (SRCN), which obtains structural features by adaptively learning the correlation of facial regions. Furthermore, they are fused with the parallel branch‐extracted global features to obtain a comprehensive high‐semantic feature representation. Finally, weights are assigned to the two features through the channel attention mechanism for more accurate expression classification. Experimental results show that our method can effectively extract expression features in an end‐to‐end manner, improve the accuracy of FER, and achieve competitive performance without relying on any a priori knowledge. And the region‐adaptive correlation feature extraction branch RACN can be applied to other deep learning networks to extract discriminative structural‐adaptive features. To the best of our knowledge, our work is the first to enrich the feature representation for end‐to‐end static FER by adaptively obtaining more discriminative regional adaptive correlation feature vectors via the autocorrelation matrix combined with CNN compared to the existing literature.

Relation and context augmentation network for facial expression recognition

Relation-aware Network for Facial Expression Recognition

Multi-relations Aware Network for in-the-wild Facial Expression Recognition

An Improved SimAM Based CNN for Facial Expression Recognition

Efficient Facial Expression Recognition with Representation Reinforcement Network and Transfer Self-Training for Human–Machine Interaction

Relation-Aware Facial Expression Recognition

Facial Expression Recognition Based on Zero-Addition Pretext Training and Feature Conjunction-Selection Network in Human–Robot Interaction

Enhanced Hybrid Vision Transformer with Multi-Scale Feature Integration and Patch Dropping for Facial Expression Recognition

Facial Expression Recognition Based on Multi-Scale Convolutional Vision Transformer

Facial expression recognition through multi-level features extraction and fusion

Learning Associative Representation for Facial Expression Recognition.

Facial Expression Recognition with Visual Transformers and Attentional Selective Fusion

Automatic 4D Facial Expression Recognition via Collaborative Cross-domain Dynamic Image Network.

Spatio-Temporal Facial Expression Recognition Using Convolutional Neural Networks and Conditional Random Fields

Facial expression recognition based on regional adaptive correlation

Hybrid Attention-Aware Learning Network for Facial Expression Recognition in the Wild

The Facial Expression Recognition Method Based on Image Fusion and CNN

A convolution-transformer dual branch network for head-pose and occlusion facial expression recognition

Facial Expression Recognition Based on Fine-Tuned Channel–Spatial Attention Transformer

CF-DAN: Facial-expression recognition based on cross-fusion dual-attention network