Driver Facial Expression Recognition Based on ViT and StarGAN

Zhijie Huang,Yuezhao Yu,Chao Gou
DOI: https://doi.org/10.1109/dtpi52967.2021.9540071
2021-01-01
Abstract:In the past decades, thanks to the development of deep neural networks (DNNs), more and more facial expression recognition (FER) methods based on DNN overcome the limitations of conventional FER methods based on machine learning and achieved excellent performance. However, in the driving scenario, the performance of FER is limited due to the large head pose, variable illumination and lack of driver facial expression dataset. In this work, instead of using the traditional CNN backbone network structure, we introduce Vision Transformer (ViT), a deep network model based on the multi-head self-attention mechanism and perform data augmentation based on the parallel imaging framework with StarGAN network. Experimental results on benchmark dataset of CK+ and KMU-FED validate the effectiveness of the proposed method.
What problem does this paper attempt to address?