Facial Expression Recognition Based on Multi-Scale Convolutional Vision Transformer

Cheng-Shan Jiang,Zhen-Tao Liu
DOI: https://doi.org/10.23919/ascc56756.2022.9828211
2022-01-01
Abstract:Facial expression recognition (FER) could endow artificial intelligence devices such as service robots with better understanding of emotional state of human beings, which facilitates the experience of human-computer interaction more harmonious and natural. The convolution operation will be limited by the receptive field, and the extracted features are local, so it is hard to understand and learn the facial expression information from a global point of view. In this paper, a Multi-Scale Convolutional Vision Transformer (MSC-ViT) is proposed for FER. It replaces the linear embedding with Convolutional Tokenization, and uses the Multi-Scale Convolutional Position Mapping (MSCPM) to obtain the multi-scale feature information of each facial expression image patches, and carries on the information integration and feature learning from the global perspective of view. We verify the performance of the MSC-ViT on the RaFD data set, and the recognition accuracy is 98.26%.
What problem does this paper attempt to address?