Appearance-based Gaze Estimation with Multi-Modal Convolutional Neural Networks

Fei Wang,Yan Wang,Teng Li
DOI: https://doi.org/10.1117/12.2603762
2021-01-01
Abstract:Existing methods on appearance-based gaze estimation mostly regress gaze direction from eye images, neglecting facial information and head pose which can be much helpful. In this paper, we propose a robust appearance-based gaze estimation method that regresses gaze directions jointly from human face and eye. The face and eye regions are located based on the detected landmark points, and representations of the two modalities are modeled with the convolutional neural networks (CNN), which are finally combined for gaze estimation by a fused network. Furthermore, considering the various impact of different facial regions on human gaze, the spatial weights for facial area are learned automatically with an attention mechanism and are applied to refine the facial representation. Experimental results validate the benefits of fusing multiple modalities in gaze estimation on the Eyediap benchmark dataset, and the propose method can yield better performance to previous advanced methods.
What problem does this paper attempt to address?