Micro-expression Spotting with Multi-scale Local Transformer in Long Videos

Xupeng Guo,Xiaobiao Zhang,Lei Li,Zhaoqiang Xia
DOI: https://doi.org/10.1016/j.patrec.2023.03.012
IF: 4.757
2023-03-15
Pattern Recognition Letters
Abstract:Micro-expression analysis by computer vision techniques has attracted much attention as it can reveal the human emotions automatically. Among the analysis tasks, the temporal spotting is the most challenging task for achieving expression-aware frames from long video sequences. Compared to the well studied recognition task, more researches need to be devoted to the spotting task for further improving the performance and benefiting the subsequent tasks. So, in this paper, we propose a convolutional transformer based deep model for micro-expression spotting in long video sequences. A 3D convolutional subnetwork is firstly employed to extract the visual features from the temporal frames in a fixed-size sliding window of original video sequence. Then a multi-scale local transformer module is designed based on the visual features to model the correlation between frames in a local window. By leveraging the correlation information, the description of face movement becomes more representative for various-duration micro-expressions. Finally, the multi-head classifier and the corresponding estimator are jointly combined to predict the temporal position for spotting micro-expressions. The proposed method is evaluated on two publicly-available datasets, namely CAS(ME) 2 and SAMM-LV, and achieves the promising performance of 0.2770 F1-score on SAMM-LV and 0.1373 F1-score on CAS(ME) 2 . The code is publicly available on GitHub ( https://github.com/xiazhaoqiang/MULT-MicroExpressionSpot ).
computer science, artificial intelligence
What problem does this paper attempt to address?