Imitation Learning with Human Eye Gaze via Multi-Objective Prediction

Ravi Kumar Thakur,MD-Nazmus Samin Sunbeam,Vinicius G. Goecks,Ellen Novoseller,Ritwik Bera,Vernon J. Lawhern,Gregory M. Gremillion,John Valasek,Nicholas R. Waytowich
2023-07-23
Abstract:Approaches for teaching learning agents via human demonstrations have been widely studied and successfully applied to multiple domains. However, the majority of imitation learning work utilizes only behavioral information from the demonstrator, i.e. which actions were taken, and ignores other useful information. In particular, eye gaze information can give valuable insight towards where the demonstrator is allocating visual attention, and holds the potential to improve agent performance and generalization. In this work, we propose Gaze Regularized Imitation Learning (GRIL), a novel context-aware, imitation learning architecture that learns concurrently from both human demonstrations and eye gaze to solve tasks where visual attention provides important context. We apply GRIL to a visual navigation task, in which an unmanned quadrotor is trained to search for and navigate to a target vehicle in a photorealistic simulated environment. We show that GRIL outperforms several state-of-the-art gaze-based imitation learning algorithms, simultaneously learns to predict human visual attention, and generalizes to scenarios not present in the training data. Supplemental videos and code can be found at <a class="link-external link-https" href="https://sites.google.com/view/gaze-regularized-il/" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Human-Computer Interaction,Robotics
What problem does this paper attempt to address?
The paper attempts to address the problem of how to utilize eye-tracking data from human demonstrations to improve learning outcomes and generalization capabilities in imitation learning. Specifically, the paper proposes a new method called Gaze Regularized Imitation Learning (GRIL), which trains learning agents by combining human behavioral data (i.e., executed actions) and eye-tracking data, thereby achieving better performance and generalization capabilities in tasks such as visual navigation. The paper points out that most existing imitation learning research primarily relies on behavioral information while neglecting other potentially useful signals, such as eye-tracking data. Eye-tracking data can provide important clues about the demonstrator's visual attention allocation, which is particularly useful in visually complex environments, as these environments require learning agents to identify key areas in the scene. GRIL aims to introduce eye-tracking data as an attention mechanism into the imitation learning process to reduce the amount of human demonstration data required and to enhance the autonomy and robustness of the learning agents.