CNN-Transformer Architecture Solution for Compound Facial Expression Recognition

Sana Ullah,Yuanlun Xie,Jie Ou,Wenhong Tian
DOI: https://doi.org/10.1109/iccc59590.2023.10507346
2023-01-01
Abstract:True emotions can be indicated by the human facial expression of emotions. Facial expression recognition has vast applications in healthcare, security, artificial intelligence, e-learning, sports, agriculture and various other fields. Although significant research has been conducted on basic emotions, there is currently a surge of interest in recognizing compound facial expressions of emotions in the field of image processing. In this paper, we present a method that employs the vision transformer (ViT) and utilizes the DenseNet-121 as a backbone for recognizing compound facial expressions of emotions on the CFEE and RAFDB datasets. The proposed method has outperformed and improved the recognition accuracy of compound emotions compared to the state-of-the-art (SOTA) models. The obtained results demonstrate the accuracy rates of 66.4% for the CFEE dataset and 72.05% for the RAFDB dataset. The recognition accuracy was enhanced by around 9.05% on the CFEE dataset and approximately 3.62% on the RAFDB dataset in comparison to the current state of the art (SOTA), thanks to the proposed methodology. This approach tackles the current task and paves the way for future investigations in detecting complex emotions using ViT models.
What problem does this paper attempt to address?