Abstract:We consider the problem of user-adaptive 3D gaze estimation. The performance of person-independent gaze estimation is limited due to interpersonal anatomical differences. Our goal is to provide a personalized gaze estimation model specifically adapted to a target user. Previous work on user-adaptive gaze estimation requires some labeled images of the target person data to fine-tune the model at test time. However, this can be unrealistic in real-world applications, since it is cumbersome for an end-user to provide labeled images. In addition, previous work requires the training data to have both gaze labels and person IDs. This data requirement makes it infeasible to use some of the available data. To tackle these challenges, this paper proposes a new problem called efficient label-free user adaptation in gaze estimation. Our model only needs a few unlabeled images of a target user for the model adaptation. During offline training, we have some labeled source data without person IDs and some unlabeled person-specific data. Our proposed method uses a meta-learning approach to learn how to adapt to a new user with only a few unlabeled images. Our key technical innovation is to use a generalization bound from domain adaptation to define the loss function in meta-learning, so that our method can effectively make use of both the labeled source data and the unlabeled person-specific data during training. Extensive experiments validate the effectiveness of our method on several challenging benchmarks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the problem of label data requirements in user - adaptive 3D gaze estimation. Specifically, existing methods require a small number of annotated images of the target user to fine - tune the model during testing, which is unrealistic in practical applications because obtaining annotated data (especially 3D gaze labels) often requires professional equipment and is very difficult for ordinary users. In addition, existing methods also require that the training data contains gaze labels and user IDs, which limits the range of available data. To solve these problems, this paper proposes a new problem setting, called **ELF - UA: Efficient Label - Free User Adaptation in Gaze Estimation**. Its main contributions are as follows: 1. **New problem setting**: Different from existing methods, the method in this paper only requires a small number of unannotated images of the target user for model adaptation. During the training process, only one source dataset with gaze labels (without user IDs) and some unannotated person - specific datasets with user IDs but without gaze labels are required. 2. **Innovative method**: This paper introduces a surrogate loss function based on the domain - adaptive generalization bound for outer - loop optimization in the MAML framework. This method can effectively adapt the model to new users with only a small number of unannotated images. 3. **Experimental verification**: Through extensive experiments on multiple challenging benchmark datasets, the effectiveness of this method is proved, and it significantly outperforms other alternative methods, achieving performance comparable to the current state - of - the - art methods. ### Detailed problem description #### Limitations of existing methods - **Label requirements**: Existing methods such as [Park et al., 2019] require a small number of annotated images of the target user for fine - tuning, which is impractical for practical applications. - **Data requirements**: These methods also require that the training data contains gaze labels and user IDs, which limits the range of data that can be used. #### Characteristics of the new problem setting - **Unlabeled adaptation**: The method in this paper only requires a small number of unannotated images of the target user for model adaptation. - **Flexible data use**: During the training process, only one source dataset with gaze labels (without user IDs) and some unannotated person - specific datasets with user IDs but without gaze labels are required. #### Innovation points of the method - **Meta - learning framework**: Based on model - agnostic meta - learning (MAML), this paper proposes a new meta - learning framework for learning self - supervised user - adaptive gaze estimation. - **Surrogate loss function**: A surrogate loss function based on the domain - adaptive generalization bound is introduced for outer - loop optimization in the MAML framework, enabling the model to effectively use unannotated data for rapid adaptation. Through these innovations, the ELF - UA method proposed in this paper can effectively adapt the model to new users with only a small number of unannotated images, thus solving the limitations of existing methods in practical applications.

ELF-UA: Efficient Label-Free User Adaptation in Gaze Estimation

ADeLA: Automatic Dense Labeling with Attention for Viewpoint Shift in Semantic Segmentation

Global Adaptation Meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation.

GazeLabel

Gaze Estimation via Modulation-based Adaptive Network with Auxiliary Self-Learning

Domain-Adaptive Full-Face Gaze Estimation via Novel-View-Synthesis and Feature Disentanglement

An Individual-Difference-Aware Model for Cross-Person Gaze Estimation

FreeGaze: Resource-efficient Gaze Estimation via Frequency Domain Contrastive Learning

Unsupervised Domain Adaptation for 3D Human Pose Estimation

EasyGaze3D: Towards Effective and Flexible 3D Gaze Estimation from a Single RGB Camera

LG-Gaze: Learning Geometry-aware Continuous Prompts for Language-Guided Gaze Estimation

3DGazeNet: Generalizing Gaze Estimation with Weak-Supervision from Synthetic Views

Unsupervised Model Personalization while Preserving Privacy and Scalability: An Open Problem

Accurate Real‐time 3D Gaze Tracking Using a Lightweight Eyeball Calibration

Domain-Consistent and Uncertainty-Aware Network for Generalizable Gaze Estimation

GazeGaussian: High-Fidelity Gaze Redirection with 3D Gaussian Splatting

Adaptive Wasserstein Hourglass for Weakly Supervised RGB 3D Hand Pose Estimation

A Generalized and Robust Method Towards Practical Gaze Estimation on Smart Phone

Deep Multitask Gaze Estimation with a Constrained Landmark-Gaze Model

Improving Domain Generalization on Gaze Estimation via Branch-out Auxiliary Regularization

Appearance Debiased Gaze Estimation via Stochastic Subject-Wise Adversarial Learning