Abstract:Facial expressions recognition (FER) of 3D face scans has received a significant amount of attention in recent years. Most of the facial expression recognition methods have been proposed using mainly 2D images. These methods suffer from several issues like illumination changes and pose variations. Moreover, 2D mapping from 3D images may lack some geometric and topological characteristics of the face. Hence, to overcome this problem, a multi-modal 2D + 3D feature-based method is proposed. We extract shallow features from the 3D images, and deep features using Convolutional Neural Networks (CNN) from the transformed 2D images. Combining these features into a compact representation uses covariance matrices as descriptors for both features instead of single-handedly descriptors. A covariance matrix learning is used as a manifold layer to reduce the deep covariance matrices size and enhance their discrimination power while preserving their manifold structure. We then use the Bag-of-Features (BoF) paradigm to quantize the covariance matrices after flattening. Accordingly, we obtained two codebooks using shallow and deep features. The global codebook is then used to feed an SVM classifier. High classification performances have been achieved on the BU-3DFE and Bosphorus datasets compared to the state-of-the-art methods.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges in 3D Facial Expression Recognition (3D FER), especially how to effectively extract features from 3D facial scans and overcome the problems that existing methods may lose geometric and topological features when dealing with illumination changes, pose changes and mapping from 3D images to 2D images. For this purpose, the paper proposes a multimodal method that combines 2D and 3D features, aiming to improve the performance of 3D facial expression recognition. Specifically, the paper focuses on the following points: 1. **Feature Extraction**: Extract shallow features from 3D images, and at the same time use Convolutional Neural Network (CNN) to extract deep features from the transformed 2D images. 2. **Feature Fusion**: Combine these features into a compact representation, using the covariance matrix as a descriptor instead of a single descriptor. 3. **Covariance Matrix Learning**: Reduce the size of the deep covariance matrix through the manifold layer, while enhancing its discriminative ability and maintaining its manifold structure. 4. **Feature Quantization**: Quantize the covariance matrix using the "Bag - of - Features (BoF)" paradigm to generate two codebooks (one based on shallow features and one based on deep features). 5. **Classification**: Use the Support Vector Machine (SVM) classifier to classify the global codebook to achieve high - precision facial expression recognition. Through this method, the paper has achieved higher classification performance than existing methods on the BU - 3DFE and Bosphorus datasets, especially showing stronger robustness in dealing with illumination and pose changes.

Deep and Shallow Covariance Feature Quantization for 3D Facial Expression Recognition

An ICA-Based Other-Race Effect Elimination for Facial Expression Recognition.

Automatic Analysis of Facial Expressions Based on Deep Covariance Trajectories

Deep Covariance Descriptors for Facial Expression Recognition

Deep Representation of Facial Geometric and Photometric Attributes for Automatic 3D Facial Expression Recognition

Facial Expression Recognition in Video Using 3D-CNN Deep Features Discrimination

Towards Reading Beyond Faces for Sparsity-aware 3D/4D Affect Recognition

Multimodal 2D+3D Facial Expression Recognition with Deep Fusion Convolutional Neural Network

Automatic facial expression recognition on a single 3D face by exploring shape deformation.

Covariance Pooling For Facial Expression Recognition

Landmarks-assisted Collaborative Deep Framework for Automatic 4D Facial Expression Recognition.

Feature Level Analysis for 3D Facial Expression Recognition.

Fusing Normal Vector and Curvature Features on the Mesh for 3D Facial Expression Recognition.

Fully automatic 3D facial expression recognition using a region-based approach

Joint Structured Sparsity Regularized Multiview Dimension Reduction for Video-Based Facial Expression Recognition.

Automatic 4D Facial Expression Recognition via Collaborative Cross-domain Dynamic Image Network.

Towards Reading Beyond Faces for Sparsity-Aware 4D Affect Recognition

Fully automatic 3D facial expression recognition using differential mean curvature maps and histograms of oriented gradients

Multi-channel Deep 3D Face Recognition

3D Dynamic Facial Expression Recognition Using Low-Resolution Videos

Sparse Orthogonal Tucker Decomposition for 2D+3D Facial Expression Recognition