Abstract:Different from generic image classification, fine-grained classification, such as facial expression classification, in which multiple expressions share inherently similar underlying facial appearances, may show a small difference between facial expression classes. Unlike lab-controlled data, facial expressions from natural scenes have rich forms of the same expression due to the diversity of subjects and the complexity of real-world conditions, and as a result, facial expressions may have large differences among samples within the same class. Moreover, there is little difference between facial expressions, and facial expressions are displayed simultaneously through various facial regions, which require us to encode the feature of multiple key regions, forming high-order interactive information. To address the aforementioned problems, we design an enhanced capsule network based on multi-level feature fusion attention mechanism, which is comprised of four critical components: multi-level feature extraction module (MFEM), multi-level attention module (MAM), multi-level capsule attention fusion module (MCAFM), and reconstruction module (RM). The MFEM collects the low-level, middle-level, and high-level features from the input image, therefore lowering the high-level convolution layer's susceptibility to blurred image and the problem of pose variation. The MAM directs the network's attention to the most significant features in different levels of image features and can assist the network in ignoring blurred, occluded, and irrelevant features and incorporating them into our self-attention center loss function to compress the element distribution in the same class. The MCAFM preserves the attributes of each face region (such as location, size, and direction) by transferring the features into capsules in preparation for the eventual creation of the dynamic routing mechanism, which can resolve the problem of image rotation on FER in the wild. Simultaneously, the capsule features of distinct areas are combined to provide higher-order overall feature information, enhancing the model's capacity to discriminate between different kinds of expressions. The RM reconstructs the image and calculates the difference between the reconstructed image and the original input image. Our model outperforms a large number of current methods on two public datasets, RAF-DB and SFEW.

Facial Expression Recognition Through Cross-Modality Attention Fusion

Emotion Recognition Using Cross-Modal Attention from Eeg and Facial Expression

A Cross-Modal Fusion Network Based on Self-Attention and Residual Structure for Multimodal Emotion Recognition

Multimodal Attention Dynamic Fusion Network for Facial Micro-Expression Recognition.

Facial Emotion Recognition Combining Auxiliary Classifiers and Multiscale CBAM Attention Mechanisms

TriCAFFNet: A Tri-Cross-Attention Transformer with a Multi-Feature Fusion Network for Facial Expression Recognition

Facial Attention based Convolutional Neural Network for 2D+3D Facial Expression Recognition

Attention mechanism-based CNN for facial expression recognition

3-D Facial Expression Recognition via Attention-Based Multichannel Data Fusion Network

Multi-modal Facial Expression Feature Based on Deep-Neural Networks

Cross-domain Facial Expression Recognition Via an Intra-Category Common Feature and Inter-Category Distinction Feature Fusion Network

Attention-Rectified and Texture-Enhanced Cross-Attention Transformer Feature Fusion Network for Facial Expression Recognition

Multi-level Feature Fusion Capsule Network with Self-Attention for Facial Expression Recognition

Multimodal 2D+3D Facial Expression Recognition with Deep Fusion Convolutional Neural Network

Facial Expression Recognition With Fusion Features Extracted From Salient Facial Areas

Facial Expression Recognition in the Wild Based on Multimodal Texture Features

A Cross-Attention Emotion Recognition Algorithm Based on Audio and Video Modalities

The Facial Expression Recognition Method Based on Image Fusion and CNN

A facial expression recognition network based on attention double branch enhanced fusion

A Robust Facial Expression Recognition Algorithm Based on Multi-Rate Feature Fusion Scheme