Abstract:Objective Micro-expressions are brief,subtle facial muscle movements that accidentally signal emotions when the person tries to hide their true inner feelings.Micro-expressions are more responsive to a person's true feelings and moti-vations than macro-expressions.Micro-expression recognition aims to analyze and identify automatically the emotional cat-egory of the research object from the stressful movement of the facial muscles,which has an important application value in lie detection,psychological diagnosis,and other aspects.In the early development of micro-expression recognition,local binary patterns and optical flow were widely used as features for training traditional machine learning models.However,the traditional manual feature approach relies on manually designing rules,making it difficult to adapt to the differences in micro-expression data across different individuals and scenarios.Given that deep learning can automatically learn the opti-mal feature representation of an image,the recognition performance of micro-expression recognition studies based on deep learning far exceeds that of traditional methods.However,micro-expressions occur as subtle facial changes,which causes the micro-expression recognition task to remain challenging.By analyzing the pixel movement between consecutive frames,the optical flow can represent the dynamic information of micro-expressions.Deep learning-based micro-expression recogni-tion methods perform facial muscle motion descriptions with optical flow information to improve micro-expression recogni-tion performance.However,existing micro-expression recognition methods usually extract the optical flow information offline,which relies on existing optical flow estimation techniques and suffers from the insufficient description of subtle expressions and neglect of static facial expression information,which restricts the recognition effect of the model.There-fore,this study proposes a micro-expression recognition network based on adaptive optical flow estimation,which realizes optical flow estimation and micro-expression classification to learn micro-expression-related motion features through parallel association adaptively.Method The training samples of micro-expressions are limited,which makes it difficult to train com-plex network models.Therefore,this study selects the apex and their neighboring frames in the micro-expression video sequence as training data in the preprocessing stage.In addition,when loading the data,the original training data are replaced with image pairs containing motion information in the video sequence with a certain probability.Second,the deep learning network with a dense differential encoder-decoder implements the facial muscle motion adaptive optical flow esti-mation task to improve the characterization of subtle expressions.ResNet18 extracts features from the two-frame image and the difference map in a dense differential encoder.The branch processing the two frames shares the parameters.A motion enhancement module is added to the feature extraction branch of the differential image to accomplish the interlayer informa-tion interaction.In the motion enhancement module,the difference map features computed from the two frames need the spatial attention mechanism to focus on the micro-expression-related motion;the two frames are subtracted from each other to preserve and amplify the difference between the two frames,and using the two features provides valid information for sub-sequent networks.The decoder in this study maps the multilevel facial displacement information extracted by the dense dif-ferential encoder and the last layer of the two-frame image output features to reconstruct the optical flow features.Vision Transformer is a deep learning model based on the self-attention mechanism,which has global perception capability in com-parison with the traditional convolutional neural network.Then,with the feature extraction capability of vision Trans-former,the micro-expression discriminative information embedded in the reconstructed optical flow is mined.Finally,the semantic information of micro-expressions extracted from facial displacement information and the discriminative information of micro-expressions extracted from the vision Transformer model are fused to provide rich information for micro-expression classification.This study uses the Endpoint error loss constraint for the optical flow estimation task to achieve the learning purpose,which continuously reduces the Euclidean distance between the predicted and real optical flow.Cross entropy loss function constraints are used for the features extracted by vision Transformer and the fused features,which make the network learn micro-expression related information.At the same time,the image with low motion intensity in the two frames is equivalent to the neutral expression(without motion information),and the KL-divergence loss is applied to the output of the feature by the encoder to suppress irrelevant information.The loss functions interact to complete the network optimization.Result This study evaluates the model performance on a public dataset using the leave-one-subject-out cross-validation evaluation strategy.Face alignment and cropping are performed on the public dataset samples to unify the data-set.To demonstrate the state-of-the-art of the proposed method,we compare it with existing mainstream methods on com-posite datasets constructed by SMIC,SAMM,and CASME Ⅱ.Our method achieves 82.89％and 85.59％UF1 and UAR on the whole dataset,78.16％and 80.89％UF1 and UAR on the SMIC part,94.52％and 96.02％UF1 and UAR on the CASME Ⅱ part,and 73.24％and 75.83％.Our method achieves optimal results in the whole dataset,the SMIC part,and the CASME Ⅱ part,and suboptimal results in the SAMM part.Compared to the latest proposed micro-expression method based on feature representation learning with adaptive displacement generation and Transformer fusion(FRL-DGT),our method demonstrates an improvement of 1.77％and 4.85％.Conclusion The micro-expression recognition model based on adaptive optical flow estimation proposed in this study fuses the proposed two tasks of adaptive optical flow estimation and micro-expression categorization,which,on the one hand,senses the subtle facial movements in an end-to-end manner and improves the ability of subtle expression description,and on the other hand,fully exploits the micro-expression discrimina-tive information and enhances the micro-expression performance.

Video Expression Recognition Method Based on Facial Motion Unit and Temporal Attention

An optimized Capsule-LSTM model for facial expression recognition with video sequences

Twin attention based multi-task convolutional bidirectional long short term memory for facial expression recognition

Real-Time facial expression recognition system based on HMM and feature point localization

Learning facial expression and body gesture visual information for video emotion recognition

Learning Expression Features via Deep Residual Attention Networks for Facial Expression Recognition From Video Sequences

Discriminative Video Representation with Temporal Order for Micro-expression Recognition

SAANet: Siamese Action-Units Attention Network for Improving Dynamic Facial Expression Recognition

Facial Micro-Expression Recognition Based on Multi-Scale Temporal and Spatial Features

Facial Expression Recognition Method Combined with Attention Mechanism

Micro-expression Video Clip Synthesis Method Based on Spatial-temporal Statistical Model and Motion Intensity Evaluation Function

Attention mechanism-based CNN for facial expression recognition

Clip-aware expressive feature learning for video-based facial expression recognition

Multi-Attention Module for Dynamic Facial Emotion Recognition

Microexpression Recognition Method Based on ADP-DSTN Feature Fusion and Convolutional Block Attention Module

A Face Sequence Recognition Method Based on Deep Convolutional Neural Network

Adaptive Optical Flow Estimation-Driven Micro-Expression Recognition

A method for recognizing facial expression intensity based on facial muscle variations

Subtle expression recognition-a comprehensive approach

Hierarchical Space-Time Attention for Micro-Expression Recognition

Expression recognition from video using a coupled hidden Markov model