Ig3D: Integrating 3D Face Representations in Facial Expression Inference

Lu Dong,Xiao Wang,Srirangaraj Setlur,Venu Govindaraju,Ifeoma Nwogu
2024-08-30
Abstract:Reconstructing 3D faces with facial geometry from single images has allowed for major advances in animation, generative models, and virtual reality. However, this ability to represent faces with their 3D features is not as fully explored by the facial expression inference (FEI) community. This study therefore aims to investigate the impacts of integrating such 3D representations into the FEI task, specifically for facial expression classification and face-based valence-arousal (VA) estimation. To accomplish this, we first assess the performance of two 3D face representations (both based on the 3D morphable model, FLAME) for the FEI tasks. We further explore two fusion architectures, intermediate fusion and late fusion, for integrating the 3D face representations with existing 2D inference frameworks. To evaluate our proposed architecture, we extract the corresponding 3D representations and perform extensive tests on the AffectNet and RAF-DB datasets. Our experimental results demonstrate that our proposed method outperforms the state-of-the-art AffectNet VA estimation and RAF-DB classification tasks. Moreover, our method can act as a complement to other existing methods to boost performance in many emotion inference tasks.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: integrating 3D face representations into the Facial Expression Inference (FEI) task to improve the performance of facial expression classification and valence - arousal (VA) estimation based on the face. Specifically: 1. **Evaluating the effects of 3D face representations**: - Researchers first evaluated the performance of two 3D face representation methods (EMOCA and SMIRK) based on the 3D Morphable Model (FLAME) in the FEI task. 2. **Design and evaluation of fusion architectures**: - In order to combine 3D face representations with the existing 2D inference framework, researchers explored two fusion architectures: intermediate fusion and late fusion. These two architectures aim to seamlessly integrate features from different modalities. 3. **Experimental verification**: - Researchers conducted extensive experiments on the AffectNet and RAF - DB datasets to evaluate the effectiveness of the proposed fusion architectures. The experimental results show that the late - fusion architecture outperforms the existing state - of - the - art methods in multiple metrics. Through these works, researchers hope to prove that 3D face representations can significantly improve the performance of the FEI task and provide valuable insights for future research. ### Summary of key contributions 1. **Analysis of 3D face representation parameters**: - Compared two of the latest 3D face regression models (EMOCA and SMIRK), and demonstrated the superior performance of EMOCA in the FEI task on the benchmark dataset. 2. **Proposing two fusion architectures**: - Proposed intermediate fusion and late fusion architectures, and experimentally proved that the late - fusion architecture has better effects on facial expression inference performance. 3. **Efficient and flexible architecture**: - Proposed a simple and effective architecture that can be flexibly applied in multiple emotion inference tasks. The experimental results show that this method outperforms the existing state - of - the - art level in AffectNet VA estimation and RAF - DB classification tasks. ### Experimental results - On the AffectNet dataset, the weighted late - fusion strategy increased the accuracy by 3.84%, the F1 - score by 3.80%, the precision by 3.75%, and the recall by 3.84%. - On the RAF - DB dataset, the late - fusion method achieved an accuracy of 94.00%, surpassing the existing state - of - the - art level. ### Conclusion Research shows that the introduction of 3D face representations can significantly improve the performance of facial expression inference tasks, especially in continuous emotion inference tasks. The late - fusion architecture performs well in integrating 3D and 2D information, providing a new direction for future research. --- If you have more questions or need further explanations, please feel free to let me know!