Abstract:Facial expression recognition (FER) plays a crucial role in affective computing, enhancing human-computer interaction by enabling machines to understand and respond to human emotions. Despite advancements in deep learning, current FER systems often struggle with challenges such as occlusions, head pose variations, and motion blur in natural environments. These challenges highlight the need for more robust FER solutions. To address these issues, we propose the Attention-Enhanced Multi-Layer Transformer (AEMT) model, which integrates a dual-branch Convolutional Neural Network (CNN), an Attentional Selective Fusion (ASF) module, and a Multi-Layer Transformer Encoder (MTE) with transfer learning. The dual-branch CNN captures detailed texture and color information by processing RGB and Local Binary Pattern (LBP) features separately. The ASF module selectively enhances relevant features by applying global and local attention mechanisms to the extracted features. The MTE captures long-range dependencies and models the complex relationships between features, collectively improving feature representation and classification accuracy. Our model was evaluated on the RAF-DB and AffectNet datasets. Experimental results demonstrate that the AEMT model achieved an accuracy of 81.45% on RAF-DB and 71.23% on AffectNet, significantly outperforming existing state-of-the-art methods. These results indicate that our model effectively addresses the challenges of FER in natural environments, providing a more robust and accurate solution. The AEMT model significantly advances the field of FER by improving the robustness and accuracy of emotion recognition in complex real-world scenarios. This work not only enhances the capabilities of affective computing systems but also opens new avenues for future research in improving model efficiency and expanding multimodal data integration.

Learning Transferable Compound Expressions from Masked AutoEncoder Pretraining

Compound Expression Recognition via Multi Model Ensemble for the ABAW7 Challenge

Cgan Based Facial Expression Recognition for Human-Robot Interaction

DR-FER: Discriminative and Robust Representation Learning for Facial Expression Recognition

Compound Expression Recognition via Multi Model Ensemble

Affective Behaviour Analysis via Progressive Learning

Compound facial expressions recognition approach using DCGAN and CNN

A Transfer Learning Approach to Compound Facial Expression Recognition

Multi-modal Facial Affective Analysis based on Masked Autoencoder

Emotic Masked Autoencoder with Attention Fusion for Facial Expression Recognition

An Effective Ensemble Learning Framework for Affective Behaviour Analysis

Clip-aware expressive feature learning for video-based facial expression recognition

Evaluation and analysis of visual perception using attention-enhanced computation in multimedia affective computing

Facial Expression Recognition With Visual Transformers and Attentional Selective Fusion

MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition

Boosting Continuous Emotion Recognition with Self-Pretraining using Masked Autoencoders, Temporal Convolutional Networks, and Transformers

Facial Expression Recognition Based on Multi-modal Features for Videos in the Wild

Adaptively Learning Facial Expression Representation via C-F Labels and Distillation

Zero-shot Compound Expression Recognition with Visual Language Model at the 6th ABAW Challenge

SAANet: Siamese Action-Units Attention Network for Improving Dynamic Facial Expression Recognition