A Region Group Adaptive Attention Model for Subtle Expression Recognition
Gan Chen,Junjie Peng,Wenqiang Zhang,Kanrun Huang,Feng Cheng,Haochen Yuan,Yansong Huang
DOI: https://doi.org/10.1109/taffc.2021.3133429
IF: 13.99
2023-01-01
IEEE Transactions on Affective Computing
Abstract:Facial expression recognition has received extensive attention in recent years due to its important applications in many fields. Most expression samples used in research are relatively easy to analyze emotions because they have explicit expressions with strong intensities. However, in situations such as video question and answer, business negotiation, polygraph detection in the security field, autism treatment and medical escort, emotions are expressed in suppressed manners with low intensive expression or subtle expressions, making it difficult to estimate emotions accurately. In these situations, how to effectively extract expression features from facial expression images is a critical problem that affects the accuracy of subtle expression recognition. To address this problem, we propose an end-to-end group adaptive attention model for subtle expression recognition. Cropping an image into several regions of interest (ROI) according to the correlations between facial skeleton and emotions, the proposed model analyses the relationship among regions of interest, and mutual relations between local regions and the holistic region. Using the region group adaptive attention mechanism, the model effectively trains the convolutional neural network to efficiently extract facial expressions representing features and increases the accuracy and robustness of the recognition, particularly in some subtle facial expression circumstances. To improve the ability of different regional features to discriminate expressions, a group adaptive loss function is introduced to verify and improve estimation accuracy. Extensive experiments are conducted on the existing public face datasets CK+, JAFFE, KDEF and the self-collected subtle expression dataset SFER. Results show that the proposed model achieves accuracies of 99.59%, 95.20%, and 93.47% with datasets CK+, JAFFE, and KDEF, respectively. The proposed model thus generally achieves better performance in facial expression recognition than other methods.