Abstract:Micro-expressions (MEs) are involuntary and quickly displayed facial expressions that reveal subtle psychological activities. Most previous research typically focused on two separate tasks: micro-expression spotting and recognition. We aim to propose a high-precision "spotting+recognition" method that can spot ME intervals from long videos and recognize their emotional categories. Due to the occurrence sparsity of MEs, there is a significant imbalance between the number of micro-expression intervals and non-micro-expression intervals in long videos. This imbalance makes it challenging for models trained using conventional strategies to distinguish true MEs from noise samples caused by head movements, blinking, and macro-expressions, resulting in a high false-positive-rate and reducing the overall performance. We reduce the number of smooth segments to alter the data distribution within the non-micro-expression (non-ME) category. This adjustment enables the model to focus more on the subtle differences between noise samples and ME samples. To achieve this, we design an ingenious training data preparation strategy: using false positive samples from the initial spotting results as non-ME category samples, and using true positive and false negative samples from the initial spotting as emotion category samples. These are combined as the training data, creating a recognition model capable of both emotion classification and non-ME category determination. Additionally, we propose a three-stage micro-expression analysis method, including ME spotting, ME recognition and non-ME intervals removal module. Our method is validated through five-fold cross-validation experiments on the CAS(ME)² and SAMM Long Video datasets, achieving a overall STRS metric of 0.16, which significantly outperformed baseline methods and demonstrated the effectiveness of our approach.

3D Feature Extraction Network Based on Self-supervision for Micro-expression Spotting

Micro-expression Spotting with Multi-scale Local Transformer in Long Videos

A Multi-scale Feature Learning Network with Optical Flow Correction for Micro- and Macro-expression Spotting

3D-CNN for Facial Micro- and Macro-expression Spotting on Long Video Sequences using Temporal Oriented Reference Frame

Micro-expression recognition using 3D DenseNet fused Squeeze-and-Excitation Networks

Efficient Micro-Expression Spotting Based on Main Directional Mean Optical Flow Feature

Micro-expression recognition based on multi-scale 3D residual convolutional neural network

Facial Micro-Expression Recognition Based on Multi-Scale Temporal and Spatial Features

Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition

Micro-expression spotting with a novel wavelet convolution magnification network in long videos

Micro-Expression Spotting Based on a Short-Duration Prior and Multi-Stage Feature Extraction

Enhancing Micro-Expression Analysis Performance by Effectively Addressing Data Imbalance

LGSNet: A Two-Stream Network for Micro- and Macro-Expression Spotting With Background Modeling

Spatiotemporal Recurrent Convolutional Networks for Recognizing Spontaneous Micro-expressions

SpotFormer: Multi-Scale Spatio-Temporal Transformer for Facial Expression Spotting

MESNet: A Convolutional Neural Network for Spotting Multi-Scale Micro-Expression Intervals in Long Videos

Synergistic Spotting and Recognition of Micro-Expression via Temporal State Transition

Spontaneous Facial Micro-expression Recognition Via Deep Convolutional Network

Spatio-temporal Fusion for Macro- and Micro-expression Spotting in Long Video Sequences

Integrating VideoMAE based model and Optical Flow for Micro- and Macro-expression Spotting

Research on Micro-Expression Spotting Method Based on Optical Flow Features