TGMAE: Self-supervised Micro-Expression Recognition with Temporal Gaussian Masked Autoencoder

Shifeng Liu,Xinglong Mao,Sirui Zhao,Chaoyou Fu,Ying Yu,Tong Xu,Enhong Chen
DOI: https://doi.org/10.1109/icme57554.2024.10687556
2024-01-01
Abstract:Micro-expressions (MEs) are fleeting, subtle, and involuntary facial expressions that can reveal genuine emotions of human beings. Although many advanced supervised deep learning efforts have been devoted to ME recognition (MER), they are severely limited by the lack of sufficient well-labeled ME data when learning discriminative ME features. To address this problem, we propose a novel self-supervised ME representation learning method based on Temporal Gaussian Masked Autoencoder, termed TGMAE. Specifically, a Temporal Gaussian Masking strategy is customized to construct a challenging spatiotemporal ME movement reconstruction task, which can effectively assist the model in perceiving ME features from abundant unlabeled ME data. Additionally, to bridge the semantic gap between encoded features for reconstruction and emotion features for recognition, a bridging classifier is introduced for downstream MER. Comprehensive experiments demonstrate the remarkable performance of TGMAE, significantly surpassing the second-best method with a maximum improvement of 4.96% in UF1 and 6.92% in UAR.
What problem does this paper attempt to address?