Transformer Embedded Spectral-Based Graph Network for Facial Expression Recognition

Xing Jin,Xulin Song,Xiyin Wu,Wenzhu Yan
DOI: https://doi.org/10.1007/s13042-023-02016-z
2024-01-01
International Journal of Machine Learning and Cybernetics
Abstract:Deep graph convolution networks which exploit relations among facial muscle movements for facial expression recognition (FER) have achieved great success. Due to the limited receptive field, existing graph convolution operations are difficult to model long-range muscle movement relations which plays a crucial role in FER. To alleviate this issue, we introduce the transformer encoder into graph convolution networks, in which the vision transformer enables all facial muscle movements to interact in the global receptive field and model more complex relations. Specifically, we construct facial graph data by cropping regions of interest (ROIs) which are associated with facial action units, and each ROI is represented by the representation of hidden layers from deep auto-encoder. To effectively extract features from the constructed facial graph data, we propose a novel transformer embedded spectral-based graph convolution network (TESGCN), in which the transformer encoder is exploited to interact with complex relations among facial RIOs for FER. Compared to vanilla graph convolution networks, we empirically show the superiority of the proposed model by conducting extensive experiments across four facial expression datasets. Moreover, our proposed TESGCN only has 80K parameters and 0.41MB model size, and achieves comparable results compared to existing lightweight networks.
What problem does this paper attempt to address?