Multi-Scale Attention Learning Network for Facial Expression Recognition

Qian Dong,Weihong Ren,Yu Gao,Weibo Jiang,Honghai Liu
DOI: https://doi.org/10.1109/lsp.2023.3336257
2023-01-01
IEEE Signal Processing Letters
Abstract:Facial Expression Recognition (FER) aims to identify emotional expressions in human faces, and it is a fundamental task in computer vision. Recently, some methods apply Vision Transformer (ViT) to FER and have achieved promising results. However, FER still suffers from two key issues: inter-class similarity and intra-class discrepancy. To address the issues, in this letter, we propose a Multi-Scale Attention Learning Network (MALN) based on ViT, which can learn facial expression embeddings in a multi-scale manner. Specifically, we adopt a multi-branch ViT architecture to adaptively explore multi-scale correlations without self-attention. Furthermore, we also design a Scale Distinction Loss (SDL) to dynamically regulate facial embeddings from multiple branches, which can guide ViT to capture discriminative facial regions. Experimental results on three public datasets (inluding RAF-DB, AffectNet and FERPlus) demonstrate the effectiveness of our proposed MALN for FER.
What problem does this paper attempt to address?