An Encoder-Decoder Network with Residual and Attention Blocks for Full-Face 3D Gaze Estimation

Xinyuan Song,Shaoxiang Guo,Zhenfu Yu,Junyu Dong
DOI: https://doi.org/10.1109/icivc55077.2022.9886734
2022-01-01
Abstract:This paper proposes a novel end-to-end network to improve the accuracy of gaze estimation task with full-face image as input. We first explored the possibility of using the encoder-decoder network to reconstruct the input face image, then we used U-Net with residual blocks to retain eyes features hidden in high resolution feature map layers, which are often lost during down-sampling and convolution layers. Finally, we applied spatial and channel-wise attention blocks to our model to better consider the relations among different regions globally and enhance the contribution of valuable gaze-related regions. We conducted experiments on the ETH-XGaze dataset. The results turned out that our proposed model is very competitive compared with existing state-of-the-art methods for person-independent gaze estimation.
What problem does this paper attempt to address?