Integrating VideoMAE based model and Optical Flow for Micro- and Macro-expression Spotting

Ke Xu,Kang Chen,Licai Sun,Zheng Lian,Bin Liu,Gong Chen,Haiyang Sun,Mingyu Xu,Jianhua Tao
DOI: https://doi.org/10.1145/3581783.3612868
2023-01-01
Abstract:The task of interval localization of macro- and micro-expression in long videos has a wide range of applications in the field of human-computer interaction. Compared with macro-expression, micro-expression has shorter duration, lower intensity, and smaller number of samples, which make them more difficult to spot accurately in long videos. In this paper, we propose a pre-trained model combined with the optical flow method to improve the accuracy and robustness of macro- and micro-expression spotting. Firstly, self-supervised pre-training is performed on rich unlabeled data based on VideoMAE. Then, multiple models are trained on the datasets SAMM-LV and CAS(ME)³ for macro- and micro-expression with different fine-grains. Finally, different lengths of slices are generated based on the models with different fine-grains, and the optimal matching method through the combination of model fine-grainedness and slice lengths is explored. At the same time, macro- and micro-expression generating regions were spotted using the optical flow method, fused with the model outputs to supplement the spatio-temporal information not captured by the model and to exclude the interference of non-interested regions. We evaluated the performance of our method on the MEGC2023 testset (consisting of 10 long videos from SAMM and 20 long videos from CAS(ME)3) and won first place in the MEGC2023 Challenge. The results demonstrate the effectiveness of the method.
What problem does this paper attempt to address?