Occlusion-Aware Visual-Language Model for Occluded Facial Expression Recognition

Ding Wang,Yu Gu,Liang Luo,Fuji Ren
DOI: https://doi.org/10.1109/ijcnn60899.2024.10651502
2024-01-01
Abstract:Recent research on facial expression recognition (FER) has achieved significant performance on FER datasets. However, in real-world recognition scenarios, performance is often compromised by occlusion, particularly after the COVID-19. To address this issue, we propose a novel framework named OCLIPER, an visual-language model designed to enhance occluded facial expression recognition. Specifically, Our approach consists of two parts: 1) Visual Part: We initially generated realistic occlusions commonly encountered in daily life, such as masks, glasses, and hands, by incorporating them into FER datasets. Subsequently, we introduce a similarity loss between original facial images and occluded facial images. This guides the image encoder to learn a robust facial representation insensitive to occlusion. 2) Text Part: Learnable prompts and the Occluded Facial Expression Descriptor (OFED) are utilized as inputs to the text encoder. Learnable prompts assist the model in understanding relevant context information for each expression. OFED comprises a series of text descriptions of facial expressions behind occlusion, generated by ChatGPT. Ultimately, the text part guides the image part in learning occlusion-resistant features. Experimental results on various databases demonstrate the superiority of our proposed method over state-of-the-art approaches.
What problem does this paper attempt to address?