Facial Action Unit Recognition Based on Self-Attention Spatiotemporal Fusion
Chaolei Liang,Wei Zou,Danfeng Hu,JiaJun Wang
DOI: https://doi.org/10.1145/3670105.3670210
2024-01-01
Abstract:Facial Action Units (AUs) serve as a precise descriptor of facial expressions, revealing an individual's psychological and mental state. Therefore, AU detection plays important roles in facial expression recognition. Existing methods often focus on extracting intra-frame information while pay less attention to inter-frame feature changes. To address this issue, this paper proposes a self-attention spatiotemporal fusion method (SAtt-STPN). In this method, a feature extractor (AFE) is specifically designed to extract uniform feature information from both strongly and weakly correlated regions. A spatiotemporal perception (STP) module is specifically designed to capture temporal information for each AU through mutually-driven independent branches in both spatial and temporal dimensions while a graph convolutional network is adopted to model intra-frame AU relationships (ARM). Ultimately, intra-frame and inter-frame information are weighted and fused for classification. Experimental results on two public datasets (BP4D and DISFA) show that the our proposed SAtt-STPN outperforms state-of-the-art methods in facial AU detection.
What problem does this paper attempt to address?