Dual-STI: Dual-path spatial-temporal interaction learning for dynamic facial expression recognition

Min Li,Xiaoqin Zhang,Chenxiang Fan,Tangfei Liao,Guobao Xiao
DOI: https://doi.org/10.1016/j.ins.2024.120953
IF: 8.1
2024-06-24
Information Sciences
Abstract:Learning facial evaluation is crucial for dynamic facial expression recognition. Current recognition methods typically extract temporal features after spatial features to achieve low computation complexity. However, these methods struggle to model complex facial evaluations due to a lack of interaction between spatial and temporal features. This paper proposes a novel Dual-path Spatial-Temporal Interaction (Dual-STI) framework that concurrently extracts spatial and temporal features through two efficient paths. Specifically, Dual-STI comprises a spatial path and a temporal path. The spatial path contains several spatial transformers to capture robust facial features from each sampled frame, while the temporal path includes several temporal transformers to learn rich contextual facial features from the sequence of frames. To facilitate spatial-temporal interaction, Dual-STI features a distinct dual-path interaction module that adaptively fuses spatial and temporal features by combining spatial and temporal attention mechanisms. Additionally, comparative learning is introduced into the loss function to enhance this interaction. To evaluate the proposed method, extensive experiments are conducted on three popular benchmarks, namely DFEW, AFEW, and FERV39k. The experimental results demonstrate that the proposed Dual-STI achieves state-of-the-art performance with low computational complexity across all datasets. Notably, Dual-STI shows significant improvements in the "disgust" and "fear" categories, with precision increases of 3.45% and 2.1% on the DFEW dataset, respectively.
computer science, information systems
What problem does this paper attempt to address?