Multimodal Fusion-based Swin Transformer for Facial Recognition Micro-Expression Recognition

Xinhua Zhao,Yongjia Lv,Zheng Huang
DOI: https://doi.org/10.1109/ICMA54519.2022.9856162
2022-01-01
Abstract:Micro-expression recognition is the domain of vigorous computational vision research, which up against significant challenges stems from micro-expressions being spontaneous, brief and faint facial muscle movements. The paper presents a very novel method of Multimodal fusion micro-expression recognition using a visual transformer, which is not commonly used for micro-expression recognition. As compared to convolutional neural networks, transformers are widely thought to require more data. Then, we choose similar expression datasets to pre-training the model, while increasing the number of datasets.The results of the validation and evaluation of the model conducted with the CASME II, MMEW and SMIC datasets yielded state-of-the-art performance in terms of average accuracy of 81.50%, 82.97%, and 79.99%, respectively.When using Score-CAM to obtain the facial expression activation heat map, it is obvious that our model matches well with the expression action units. The proposed model obtains more promising recognition results than many other recognition methods.
What problem does this paper attempt to address?