Progressive Self-supervised Representation Learning for 3D Facial Expression Recognition

Hebeizi Li,Hongyu Yang,Di Huang
DOI: https://doi.org/10.1109/ijcb62174.2024.10744458
2024-01-01
Abstract:Facial expression recognition (FER) is a critical area of research in face analysis. While 2D data has been extensively used, 3D data offers inherent advantages, such as increased resilience to illumination and pose variations. However, the limited size of current 3D FER datasets significantly constrains the performance of 3D FER methods. To overcome this challenge, we propose a novel self-supervised pre-training scheme by leveraging large-scale external 3D data, followed by fine-tuning on 3D FER datasets. Our approach starts with self-supervised learning on a large-scale 3D point cloud object dataset, specifically ShapeNet. We then move on to the FaceScape dataset, which is primarily used for morphable face prediction. To enhance robustness, we integrate synthetic data before fine-tuning on specific FER datasets. This multi-stage process allows the model to progressively learn 3D facial expression representations from coarse to fine. For this purpose, we utilize Point-MAE, a leading self-supervised model for representation learning. To enhance its ability for FER task, we further incorporate facial priors in the masking and point sampling steps, leveraging the distinctive characteristics of facial data. Our method achieves state-of-the-art performance on both BU-3DFE and Bosphorus datasets, matching or surpassing results achieved by other 2D+3D FER techniques.
What problem does this paper attempt to address?