Abstract:Objective Micro-expressions are brief,subtle facial muscle movements that accidentally signal emotions when the person tries to hide their true inner feelings.Micro-expressions are more responsive to a person's true feelings and moti-vations than macro-expressions.Micro-expression recognition aims to analyze and identify automatically the emotional cat-egory of the research object from the stressful movement of the facial muscles,which has an important application value in lie detection,psychological diagnosis,and other aspects.In the early development of micro-expression recognition,local binary patterns and optical flow were widely used as features for training traditional machine learning models.However,the traditional manual feature approach relies on manually designing rules,making it difficult to adapt to the differences in micro-expression data across different individuals and scenarios.Given that deep learning can automatically learn the opti-mal feature representation of an image,the recognition performance of micro-expression recognition studies based on deep learning far exceeds that of traditional methods.However,micro-expressions occur as subtle facial changes,which causes the micro-expression recognition task to remain challenging.By analyzing the pixel movement between consecutive frames,the optical flow can represent the dynamic information of micro-expressions.Deep learning-based micro-expression recogni-tion methods perform facial muscle motion descriptions with optical flow information to improve micro-expression recogni-tion performance.However,existing micro-expression recognition methods usually extract the optical flow information offline,which relies on existing optical flow estimation techniques and suffers from the insufficient description of subtle expressions and neglect of static facial expression information,which restricts the recognition effect of the model.There-fore,this study proposes a micro-expression recognition network based on adaptive optical flow estimation,which realizes optical flow estimation and micro-expression classification to learn micro-expression-related motion features through parallel association adaptively.Method The training samples of micro-expressions are limited,which makes it difficult to train com-plex network models.Therefore,this study selects the apex and their neighboring frames in the micro-expression video sequence as training data in the preprocessing stage.In addition,when loading the data,the original training data are replaced with image pairs containing motion information in the video sequence with a certain probability.Second,the deep learning network with a dense differential encoder-decoder implements the facial muscle motion adaptive optical flow esti-mation task to improve the characterization of subtle expressions.ResNet18 extracts features from the two-frame image and the difference map in a dense differential encoder.The branch processing the two frames shares the parameters.A motion enhancement module is added to the feature extraction branch of the differential image to accomplish the interlayer informa-tion interaction.In the motion enhancement module,the difference map features computed from the two frames need the spatial attention mechanism to focus on the micro-expression-related motion;the two frames are subtracted from each other to preserve and amplify the difference between the two frames,and using the two features provides valid information for sub-sequent networks.The decoder in this study maps the multilevel facial displacement information extracted by the dense dif-ferential encoder and the last layer of the two-frame image output features to reconstruct the optical flow features.Vision Transformer is a deep learning model based on the self-attention mechanism,which has global perception capability in com-parison with the traditional convolutional neural network.Then,with the feature extraction capability of vision Trans-former,the micro-expression discriminative information embedded in the reconstructed optical flow is mined.Finally,the semantic information of micro-expressions extracted from facial displacement information and the discriminative information of micro-expressions extracted from the vision Transformer model are fused to provide rich information for micro-expression classification.This study uses the Endpoint error loss constraint for the optical flow estimation task to achieve the learning purpose,which continuously reduces the Euclidean distance between the predicted and real optical flow.Cross entropy loss function constraints are used for the features extracted by vision Transformer and the fused features,which make the network learn micro-expression related information.At the same time,the image with low motion intensity in the two frames is equivalent to the neutral expression(without motion information),and the KL-divergence loss is applied to the output of the feature by the encoder to suppress irrelevant information.The loss functions interact to complete the network optimization.Result This study evaluates the model performance on a public dataset using the leave-one-subject-out cross-validation evaluation strategy.Face alignment and cropping are performed on the public dataset samples to unify the data-set.To demonstrate the state-of-the-art of the proposed method,we compare it with existing mainstream methods on com-posite datasets constructed by SMIC,SAMM,and CASME Ⅱ.Our method achieves 82.89％and 85.59％UF1 and UAR on the whole dataset,78.16％and 80.89％UF1 and UAR on the SMIC part,94.52％and 96.02％UF1 and UAR on the CASME Ⅱ part,and 73.24％and 75.83％.Our method achieves optimal results in the whole dataset,the SMIC part,and the CASME Ⅱ part,and suboptimal results in the SAMM part.Compared to the latest proposed micro-expression method based on feature representation learning with adaptive displacement generation and Transformer fusion(FRL-DGT),our method demonstrates an improvement of 1.77％and 4.85％.Conclusion The micro-expression recognition model based on adaptive optical flow estimation proposed in this study fuses the proposed two tasks of adaptive optical flow estimation and micro-expression categorization,which,on the one hand,senses the subtle facial movements in an end-to-end manner and improves the ability of subtle expression description,and on the other hand,fully exploits the micro-expression discrimina-tive information and enhances the micro-expression performance.

Facial micro-expression recognition using three-stream vision transformer network with sparse sampling and relabeling

Facial Expression Recognition Based on Multi-Scale Convolutional Vision Transformer

Facial Expression Recognition With Visual Transformers and Attentional Selective Fusion

Dual-Branch Cross-Attention Network for Micro-Expression Recognition with Transformer Variants

MFEViT: A Robust Lightweight Transformer-based Network for Multimodal 2D+3D Facial Expression Recognition

Identity-invariant representation and transformer-style relation for micro-expression recognition

Facial Expression Recognition Based on Fine-Tuned Channel–Spatial Attention Transformer

AST+SVMNet: A Novel Decomposition Method for Micro-Expression Recognition Based on Fusion Attention and Improved Spatio- Temporal Convolution by Feature Transfer

MCCA-VNet: A Vit-Based Deep Learning Approach for Micro-Expression Recognition Based on Facial Coding

Adaptive Optical Flow Estimation-Driven Micro-Expression Recognition

Enhanced Hybrid Vision Transformer with Multi-Scale Feature Integration and Patch Dropping for Facial Expression Recognition

Self-supervised vision transformer-based few-shot learning for facial expression recognition

MViT: Mask Vision Transformer for Facial Expression Recognition in the Wild

A Multi-stream Convolutional Neural Network for Micro-expression Recognition Using Optical Flow and EVM

Two-Level Spatio-Temporal Feature Fused Two-Stream Network for Micro-Expression Recognition

Inceptr: micro-expression recognition integrating inception-CBAM and vision transformer

Disentangling 3D/4D Facial Affect Recognition with Faster Multi-View Transformer

Feature Representation Learning with Adaptive Displacement Generation and Transformer Fusion for Micro-Expression Recognition

MFDAN: Multi-level Flow-Driven Attention Network for Micro-Expression Recognition

Micro-Expression Recognition Based on Multi-task Learning and Resnet18