FFNet-M - Feature Fusion Network with Masks for Multimodal Facial Expression Recognition.

Mingzhe Sui,Zhaoqing Zhu,Feng Zhao,Feng Wu
DOI: https://doi.org/10.1109/ICME51207.2021.9428100
2021-01-01
Abstract:Compared with 2D facial expression recognition (FER) and 3D FER, 2D+3D FER can handle the effects of illumination changes and pose variations. The combination of 2D texture and 3D attribute information can further improve the performance. However, most existing approaches still face two challenges: the selection of proper networks for extracting multimodal features, and the significance of local features in salient regions for expression classification. To address these challenges, we propose an efficient feature fusion network with masks (FFNet-M) for 2D+3D FER. Each 3D scan is rep-resented by three types of attribute maps (i.e., depth map, normal map, and texture image), which are then fed into FFNet-M with different networks to extract both 2D and 3D features. Moreover, we design two masks to make FFNet-M focus on 2D local features while paying attention to 3D local features in salient regions. Experimental results show that our FFNet-M outperforms state-of-the-art methods on BU-3DFE dataset and also achieves a high accuracy on Bosphorus dataset.
What problem does this paper attempt to address?