Gaze Estimation by Attention-Induced Hierarchical Variational Auto-Encoder.

Guanhe Huang,Jingyue Shi,Jun Xu,Jing Li,Shengyong Chen,Yingjun Du,Xiantong Zhen,Honghai Liu
DOI: https://doi.org/10.1109/tcyb.2023.3312392
IF: 11.8
2024-01-01
IEEE Transactions on Cybernetics
Abstract:Appearance-based gaze estimation has been widely studied recently with promising performance. The majority of appearance-based gaze estimation methods are developed under the deterministic frameworks. However, the deterministic gaze estimation methods suffer from large performance drop upon challenging eye images in low-resolution, darkness, partial occlusions, etc. To alleviate this problem, in this article, we alternatively reformulate the appearance-based gaze estimation problem under a generative framework. Specifically, we propose a variational inference model, that is, variational gaze estimation network (VGE-Net), to generate multiple gaze maps as complimentary candidates simultaneously supervised by the ground-truth gaze map. To achieve robust estimation, we adaptively fuse the gaze directions predicted on these candidate gaze maps by a regression network through a simple attention mechanism. Experiments on three benchmarks, that is, MPIIGaze, EYEDIAP, and Columbia, demonstrate that our VGE-Net outperforms state-of-the-art gaze estimation methods, especially on challenging cases. Comprehensive ablation studies also validate the effectiveness of our contributions. The code will be publicly released.
What problem does this paper attempt to address?