Dual-Branch Cross-Attention Network for Micro-Expression Recognition with Transformer Variants

Zhihua Xie,Chuwei Zhao
DOI: https://doi.org/10.3390/electronics13020461
IF: 2.9
2024-01-23
Electronics
Abstract:A micro-expression (ME), as a spontaneous facial expression, usually occurs instantaneously and is difficult to disguise after an emotion-evoking event. Numerous convolutional neural network (CNN)-based models have been widely explored to recognize MEs for their strong local feature representation ability on images. However, the main drawback of the current methods is their inability to fully extracting holistic contextual information from ME images. To achieve efficient ME learning representation from diverse perspectives, this paper uses Transformer variants as the main backbone and the dual-branch architecture as the main framework to extract meaningful multi-modal contextual features for ME recognition (MER). The first branch leverages an optical flow operator to facilitate the motion information extraction between ME sequences, and the corresponding optical flow maps are fed into the Swin Transformer to acquire motion–spatial representation. The second branch directly sends the apex frame in one ME clip to Mobile ViT (Vision Transformer), which can capture the local–global features of MEs. More importantly, to achieve the optimal feature stream fusion, a CAB (cross attention block) is designed to interact the feature extracted by each branch for adaptive learning fusion. The extensive experimental comparisons on three publicly available ME benchmarks show that the proposed method outperforms the existing MER methods and achieves an accuracy of 81.6% on the combined database.
engineering, electrical & electronic,computer science, information systems,physics, applied
What problem does this paper attempt to address?