Attention-Rectified and Texture-Enhanced Cross-Attention Transformer Feature Fusion Network for Facial Expression Recognition

Mingyi Sun,Weigang Cui,Yue Zhang,Shuyue Yu,Xiaofeng Liao,Bin Hu,Yang Li
DOI: https://doi.org/10.1109/tii.2023.3253188
IF: 12.3
2023-01-01
IEEE Transactions on Industrial Informatics
Abstract:Facial expression recognition (FER) in the wild is a challenging task for affective computing in human–machine interaction fields. However, most of the existing methods fail to learn the most prominent regions of facial images by simple cross-entropy loss due to the imbalance problem commonly existing in FER datasets, which limits the robustness and interpretability of the model. In addition, these methods only capture local features of original images with multisize shallow convolution and ignore facial texture characteristics, leading to a suboptimal recognition performance. To address these issues, in this article, we propose a novel FER network, named the attention-rectified and texture-enhanced cross-attention transformer feature fusion network (AR-TE-CATFFNet). Specifically, an attention-rectified convolution block is first designed to assist multiple convolution heads to focus on the critical areas of human faces and improve the model generalization. Second, we investigate a texture enhancement block to capture texture features through local binary pattern and gray-level co-occurrence matrix, which solves the limitation of insufficient texture information. Finally, a cross-attention transformer feature fusion block is employed to deeply integrate red, green, blue (RGB) features and texture features globally, which is beneficial to boost the accuracy of recognition. Competitive experimental results on three public datasets validate the efficacy of the proposed method, indicating that our proposed method achieves superior classification performance of 89.50% on real-world affective faces database (RAF-DB) dataset, 65.66% on AffectNet dataset, and 74.84% on FER2013 dataset against the existing methods.
What problem does this paper attempt to address?