Open-Set Video-based Facial Expression Recognition with Human Expression-sensitive Prompting

Yuanyuan Liu,Yuxuan Huang,Shuyang Liu,Yibing Zhan,Zijing Chen,Zhe Chen

2024-08-01

Abstract:In Video-based Facial Expression Recognition (V-FER), models are typically trained on closed-set datasets with a fixed number of known classes. However, these models struggle with unknown classes common in real-world scenarios. In this paper, we introduce a challenging Open-set Video-based Facial Expression Recognition (OV-FER) task, aiming to identify both known and new, unseen facial expressions. While existing approaches use large-scale vision-language models like CLIP to identify unseen classes, we argue that these methods may not adequately capture the subtle human expressions needed for OV-FER. To address this limitation, we propose a novel Human Expression-Sensitive Prompting (HESP) mechanism to significantly enhance CLIP's ability to model video-based facial expression details effectively. Our proposed HESP comprises three components: 1) a textual prompting module with learnable prompts to enhance CLIP's textual representation of both known and unknown emotions, 2) a visual prompting module that encodes temporal emotional information from video frames using expression-sensitive attention, equipping CLIP with a new visual modeling ability to extract emotion-rich information, and 3) an open-set multi-task learning scheme that promotes interaction between the textual and visual modules, improving the understanding of novel human emotions in video sequences. Extensive experiments conducted on four OV-FER task settings demonstrate that HESP can significantly boost CLIP's performance (a relative improvement of 17.93% on AUROC and 106.18% on OSCR) and outperform other state-of-the-art open-set video understanding methods by a large margin. Code is available at <a class="link-external link-https" href="https://github.com/cosinehuang/HESP" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper primarily addresses the issue of video facial expression recognition (V-FER) in open-set environments. Specifically: 1. **Open-Set Video Facial Expression Recognition (OV-FER) Task**: - Existing V-FER models are typically trained on closed datasets and can only recognize predefined expression categories. - These models perform poorly when encountering unknown expressions in real-world scenarios. - To address this, the authors introduce the OV-FER task, which aims to recognize both known and new unknown expressions. 2. **Proposed New Method**: - To overcome the limitations of existing methods in recognizing subtle expression changes, the paper proposes a novel Human Expression Sensitive Prompt (HESP) mechanism. - HESP consists of three parts: a text prompt module, a visual prompt module, and an open-set multi-task learning scheme. 3. **Objectives**: - Enhance the ability of the CLIP model to capture subtle expression changes in videos, thereby improving the accuracy of recognizing both known and unknown expressions. - Experimental results show that HESP significantly improves the performance of CLIP on the OV-FER task, with a relative AUC-ROC increase of 17.93% and an OSCR increase of 106.18%. Through these improvements, the paper aims to establish a more robust model capable of recognizing various facial expressions in diverse environments, applicable to fields such as intelligent healthcare and human-computer interaction.

Open-Set Video-based Facial Expression Recognition with Human Expression-sensitive Prompting

Cgan Based Facial Expression Recognition for Human-Robot Interaction

DR-FER: Discriminative and Robust Representation Learning for Facial Expression Recognition

Open-Set Facial Expression Recognition

CLIPER: A Unified Vision-Language Framework for In-the-Wild Facial Expression Recognition

Generalizable Facial Expression Recognition

The Relationship Between the Three‐Dimensional (3D) Structures of BF Molecules and MHC‐Related Marek's Disease Resistance in Chickens

Efficient Facial Expression Recognition with Representation Reinforcement Network and Transfer Self-Training for Human–Machine Interaction

Variance-Aware Bi-Attention Expression Transformer for Open-Set Facial Expression Recognition in the Wild

Facial Expression Recognition Based on Zero-Addition Pretext Training and Feature Conjunction-Selection Network in Human–Robot Interaction

FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos

Clip-aware expressive feature learning for video-based facial expression recognition

Combining 2D Gabor and Local Binary Pattern for Facial Expression Recognition Using Extreme Learning Machine

SAANet: Siamese Action-Units Attention Network for Improving Dynamic Facial Expression Recognition

Prompting Visual-Language Models for Dynamic Facial Expression Recognition

FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs

EmoCLIP: A Vision-Language Method for Zero-Shot Video Facial Expression Recognition

A Survey on Facial Expression Recognition of Static and Dynamic Emotions

Semantic-Rich Facial Emotional Expression Recognition

Knowledge-Enhanced Facial Expression Recognition with Emotional-to-Neutral Transformation

OUS: Scene-Guided Dynamic Facial Expression Recognition