R-FENet: A Region-based Facial Expression Recognition Method Inspired by Semantic Information of Action Units

Cong Wang,Ke Lu,Jian Xue,Yanfu Yan
DOI: https://doi.org/10.1145/3422852.3423482
2020-10-12
Abstract:Facial expression recognition is a challenging problem in real-world scenarios owing to obstacles of illumination, occlusion, pose variations, and low-quality images. Recent works have paid attention to the concept of the region of interest (RoI) to strengthen local regional features in the presentation of facial expressions. However, the regions are mostly assigned by general experience; for example, the average areas of the eyes, mouth, and nose. In addition, features in the RoI are extracted from cropped patches. This operation is repeated and inefficient because RoI areas mostly overlap. This paper presents a region-based convolutional neural network for the recognition of facial expression named R-FENet. The proposed network is constructed on the basis of ResNet and predefined expert knowledge according to the Facial Action Coding System. To locate the region related to facial expression, three RoI groups (i.e., the upper, middle, and lower facial RoIs) including seven RoI areas are delimited according to the semantic relationship between action units and facial expression. Furthermore, aiming to avoid extracting features from the original image, the RoI pooling layer is used to extract RoI features. The proposed R-FENet is validated on two public datasets of facial expression captured in the wild: AffectNet and SFEW. Experiments show that the proposed method achieves state-of-the-art results with accuracy of 60.95% on AffectNet and 55.97% on SFEW, relative to single-model methods.
What problem does this paper attempt to address?