A dual-branch network based on optical flow learning and semantic consistency for macro-expression spotting

Yun Xian,Dong Zhang,Xingzhi Wang,Dah-Jye Lee
DOI: https://doi.org/10.1007/s10489-024-05726-1
IF: 5.3
2024-09-19
Applied Intelligence
Abstract:Macro-expression spotting is an important prior step in many dynamic facial expression analysis applications. It automatically detects the onset and offset image frames of a macro-expression in the video. The state-of-the-art methods of macro-expression spotting characterize the movement of facial muscle through explicit analysis of the optical flow map and have achieved promising results. However, optical flow map estimation and expression spotting in these methods are performed in two separate and successive stages. In this paper, we propose a new dual-branch network to achieve unified optimization for expression spotting and optical flow estimation tasks. The proposed dual-branch network implicitly learns optical flow during training and enriches the feature representation with motion information. During inference, we use only the encoder of the optical flow estimation network for motion feature extraction and integrate it with expression spotting into a one-stage framework. The proposed method eliminates the need to construct optical flow maps explicitly during inference and significantly reduces the computational cost. We also apply a consistency constraint on the global- and local-level semantic features of the clip to guide the model to focus on the category-consistent regions of the video. We evaluate the proposed methods extensively on two popular facial expression spotting datasets, CAS(ME) and SAMM Long Videos. The experimental results show that compared with the state-of-the-art methods, the proposed method improves the F1-scores for MaE spotting by 5.81 and 1.57 on the CAS(ME) and SAMM Long Videos datasets respectively.
computer science, artificial intelligence
What problem does this paper attempt to address?