FERMixNet: an Occlusion Robust Facial Expression Recognition Model with Facial Mixing Augmentation and Mid-Level Representation Learning

Yansong Huang,Junjie Peng,Wenqiang Zhang,Tong Zhao,Gan Chen,Shuhua Tan,Fen Yi,Lu Wang
DOI: https://doi.org/10.1109/taffc.2024.3454102
IF: 13.99
2024-01-01
IEEE Transactions on Affective Computing
Abstract:Facial expressions can provide a better understanding of people's mental status and attitudes towards specific things. However, facial occlusion in real world is an unfavorable phenomenon that greatly affects the performance of facial expression recognition models. Recent works addressing the occlusion problem have primarily relied on attention mechanisms or occlusion discarding methods that focus on non-occluded regions of the face. However, these methods have not achieved a good balance between occlusion robustness and model efficiency. In this paper, we propose a simple and efficient model, called FERMixNet, for occluded facial expression recognition. The model incorporates a novel facial mixing augmentation strategy (FERMix) that generates new training samples by simulating real-world facial occlusion and preserving high expression-related semantic information. By co-training the original and newly generated samples, the model's occlusion robustness is improved without increasing its complexity during inference. Additionally, to further enhance the model's occlusion robustness, we include mid-level representation learning in the network to learn the discriminative non-occluded local features of the samples with low computational cost. Extensive experiments on four public facial occlusion datasets: Occlusion-RAF-DB, Occlusion-FERPlus and FED-RO show that the proposed model achieves state-of-the-art results which demonstrates the good robustness of our method for occluded facial expression recognition. Meanwhile, the proposed model also achieves state-of-the-art results on the in-the-wild facial expression datasets RAF-DB, AffectNet-8, and AffectNet-7. It proves that the proposed model has good application prospects in real world.
What problem does this paper attempt to address?