Abstract:Facial expressions help individuals convey their emotions. In recent years, thanks to the development of computer vision technology, facial expression recognition (FER) has become a research hotspot and made remarkable progress. However, human faces in real-world environments are affected by various unfavorable factors, such as facial occlusion and head pose changes, which are seldom encountered in controlled laboratory settings. These factors often lead to a reduction in expression recognition accuracy. Inspired by the recent success of transformers in many computer vision tasks, we propose a model called the fine-tuned channel–spatial attention transformer (FT-CSAT) to improve the accuracy of recognition of FER in the wild. FT-CSAT consists of two crucial components: channel–spatial attention module and fine-tuning module. In the channel–spatial attention module, the feature map is input into the channel attention module and the spatial attention module sequentially. The final output feature map will effectively incorporate both channel information and spatial information. Consequently, the network becomes adept at focusing on relevant and meaningful features associated with facial expressions. To further improve the model's performance while controlling the number of excessive parameters, we employ a fine-tuning method. Extensive experimental results demonstrate that our FT-CSAT outperforms the state-of-the-art methods on two benchmark datasets: RAF-DB and FERPlus. The achieved recognition accuracy is 88.61% and 89.26%, respectively. Furthermore, to evaluate the robustness of FT-CSAT in the case of facial occlusion and head pose changes, we take tests on Occlusion-RAF-DB and Pose-RAF-DB data sets, and the results also show that the superior recognition performance of the proposed method under such conditions.

A Novel Approach of Driver Facial Expression Recognition Based on Improved Swin Transformer

DR-FER: Discriminative and Robust Representation Learning for Facial Expression Recognition

Cgan Based Facial Expression Recognition for Human-Robot Interaction

Efficient Facial Expression Recognition with Representation Reinforcement Network and Transfer Self-Training for Human–Machine Interaction

Facial Expression Recognition Based on Multi-Scale Convolutional Vision Transformer

Facial Expression Recognition Based on Zero-Addition Pretext Training and Feature Conjunction-Selection Network in Human–Robot Interaction

On-Road Driver Emotion Recognition Using Facial Expression

The Extensive Usage of the Facial Image Threshing Machine for Facial Emotion Recognition Performance

Facial Expression Recognition Based on Fine-Tuned Channel–Spatial Attention Transformer

Facial Expression Recognition using Squeeze and Excitation-powered Swin Transformers

Facial Expression Recognition With Visual Transformers and Attentional Selective Fusion

FER-former: Multi-modal Transformer for Facial Expression Recognition

A novel driver emotion recognition system based on deep ensemble classification

An Improved SimAM Based CNN for Facial Expression Recognition

FPIRST: Fatigue Driving Recognition Method Based on Feature Parameter Images and a Residual Swin Transformer

Fine-Grained Facial Expression Recognition in Multiple Smiles

Driver Multi-task Emotion Recognition Network Based on Multi-modal Facial Video Analysis

Enhanced Hybrid Vision Transformer with Multi-Scale Feature Integration and Patch Dropping for Facial Expression Recognition

The Facial Expression Recognition Method Based on Image Fusion and CNN

Facial Expression Recognition by Expression-Specific Representation Swapping