Vicsgaze: a gaze estimation method using self-supervised contrastive learning

De Gu,Minghao Lv,Jianchu Liu
DOI: https://doi.org/10.1007/s00530-024-01458-x
IF: 3.9
2024-11-04
Multimedia Systems
Abstract:Existing deep learning-based gaze estimation methods achieved high accuracy, and the prerequisite for ensuring their performance is large-scale datasets with gaze labels. However, collecting large-scale gaze datasets is time-consuming and expensive. To this end, we propose VicsGaze, a self-supervised network that learns generalized gaze-aware representations without labeled data. We feed two gaze-specific augmentation views of the same face image into a multi-branch convolutional re-parameterization encoder to obtain feature representations. Although the two augmentation views make the origin face image present different appearances, the gaze direction they represent is consistent. We then map these two representations into an embedding space and employ a novel loss function to optimize model training. The experiments demonstrate that our VicsGaze performs outstanding cross-dataset gaze estimation on several datasets. Meanwhile, VicsGaze outperforms the baseline of supervised learning methods when fine-tuning with few calibration samples.
computer science, information systems, theory & methods
What problem does this paper attempt to address?