Micro-Expression Recognition Based on Attribute Information Embedding and Cross-modal Contrastive Learning

Yanxin Song,Jianzong Wang,Tianbo Wu,Zhangcheng Huang,Jing Xiao
DOI: https://doi.org/10.48550/arXiv.2205.14643
2022-05-29
Abstract:Facial micro-expressions recognition has attracted much attention recently. Micro-expressions have the characteristics of short duration and low intensity, and it is difficult to train a high-performance classifier with the limited number of existing micro-expressions. Therefore, recognizing micro-expressions is a challenge task. In this paper, we propose a micro-expression recognition method based on attribute information embedding and cross-modal contrastive learning. We use 3D CNN to extract RGB features and FLOW features of micro-expression sequences and fuse them, and use BERT network to extract text information in Facial Action Coding System. Through cross-modal contrastive loss, we embed attribute information in the visual network, thereby improving the representation ability of micro-expression recognition in the case of limited samples. We conduct extensive experiments in CASME II and MMEW databases, and the accuracy is 77.82% and 71.04%, respectively. The comparative experiments show that this method has better recognition effect than other methods for micro-expression recognition.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges in micro - expression recognition, especially the difficulty in training high - performance classifiers due to the short duration, low intensity of micro - expressions and the limited number of existing micro - expression samples. Specifically, the paper proposes a micro - expression recognition method based on attribute information embedding and cross - modal contrastive learning, aiming to enhance the representational ability of micro - expression recognition by introducing attribute information and using cross - modal contrastive loss, thereby improving the recognition performance in the case of a limited number of samples. ### Main contributions of the paper: 1. **Introduction of attribute information**: Creatively map the action units (AU) in the Facial Action Coding System (FACS) to the corresponding attribute information and embed it into the video network to enhance the representational ability of micro - expression recognition. 2. **Cross - modal contrastive learning loss**: Propose cross - modal contrastive learning loss. By optimizing the network, the distance between different modalities of the same sample is made closer, and the distance between different samples is made farther, thus learning stronger feature expressions. ### Method overview: - **Video feature extraction network**: Use 3D CNN to extract RGB features and FLOW features of the micro - expression sequence and fuse them. - **Attribute feature extraction network**: Use BERT network to extract text information in FACS coding. - **Cross - modal contrastive loss**: By constructing positive and negative sample pairs, calculate the cross - modal contrastive loss to make the visual features and attribute information better combined and improve the recognition effect. ### Experimental results: - The experimental results on the CASME II and MMEW databases show that the accuracy of this method is 77.82% and 71.04% respectively, which is better than other existing micro - expression recognition methods. ### Conclusion: The method proposed in this paper effectively improves the performance of micro - expression recognition by introducing attribute information and cross - modal contrastive learning, especially in the case of a limited number of samples. This method provides new ideas for further research in the field of micro - expression recognition.