Facial Expression Recognition Through Cross-Modality Attention Fusion

Rongrong Ni,Biao Yang,Xu Zhou,Angelo Cangelosi,Xiaofeng Liu
DOI: https://doi.org/10.1109/TCDS.2022.3150019
IF: 4.546
2023-01-01
IEEE Transactions on Cognitive and Developmental Systems
Abstract:Facial expressions are generally recognized based on handcrafted and deep-learning-based features extracted from RGB facial images. However, such recognition methods suffer from illumination/pose variations. In particular, they fail to recognize these expressions with weak emotion intensities. In this work, we propose a cross-modality attention-based convolutional neural network (CM-CNN) for facial expression recognition. We extract expression-related features from complementary facial images (gray-scale, local binary pattern, and depth images) to handle the illumination/pose variations and to capture appearance details that describe expressions with weak emotion intensities. Rather than directly concatenating the complementary features, we propose a novel cross-modality attention fusion network to enhance spatial correlations between any two types of facial images. Finally, the CM-CNN is optimized with an improved focal loss, which pays more attention to facial expressions with weak emotion intensities. The average classification accuracies on VT-KFER, BU-3DFE(P1), BU-3DFE(P2), and Bosphorus are 93.86%, 88.91%, 87.28%, and 85.16%, respectively. Evaluations on these databases demonstrate that our approach is competitive to state-of-the-art algorithms.
What problem does this paper attempt to address?