Abstract:A visual scanpath represents the human eye movements when scanning the visual field for acquiring and receiving visual information. Predicting visual scanpaths when a certain stimulus is presented plays an important role in modeling overt human visual attention and search behavior. In this paper, we presented an 'Inhibition of Return - Region of Interest' (IOR-ROI) recurrent mixture density network based framework learning to produce human-like visual scanpaths under task-free viewing conditions. The proposed model simultaneously predicts a sequence of ordered fixation positions and their corresponding fixation durations. Our model integrates bottom-up features and semantic features extracted by convolutional neural networks. Then the integrated feature maps are fed into the IOR-ROI Long Short-Term Memory (LSTM) which is the core component of the proposed model. The IOR-ROI LSTM is a dual LSTM unit, i.e., the IOR-LSTM and the ROI-LSTM, capturing IOR dynamics and gaze shift behavior simultaneously. IOR-LSTM simulates the visual working memory to adaptively maintain and update visual information regarding previously fixated regions. ROI-LSTM is responsible for predicting the next possible ROIs given the spatially inhibited image feature maps on the feature-wise basis. Fixation duration is predicted by a regression neural network given the viewing history and image feature maps corresponding to currently fixated ROI. Considering the eye movement pattern variations among subjects, a mixture density network is adopted to model the next fixation distribution as Gaussian mixtures and the fixation duration is also modeled using Gaussian distribution. Our model is evaluated on the OSIE and MIT low resolution eye-tracking datasets and experimental results indicate that the proposed method can achieve superior performance in predicting visual scanpaths. The code will be publicly available on URL: https://github.com/sunwj/scanpath.

Human Scanpath Estimation Based on Semantic Segmentation Guided by Common Eye Fixation Behaviors

Human Scanpath Prediction Based on Deep Convolutional Saccadic Model

Imitating the Human Visual System for Scanpath Predicting.

Scanpath Mining of Eye Movement Trajectories for Visual Attention Analysis

Scanpaths Generation for Target Search Based on Deep Learning

Scanpath Prediction Based On High-Level Features And Memory Bias

Simulating human saccadic scanpaths on natural images

Modeling of Human Saccadic Scanpaths Based on Visual Saliency.

Scanpath Prediction Via Semantic Representation of the Scene

Individual Trait Oriented Scanpath Prediction for Visual Attention Analysis

Predicting Human Scanpaths in Visual Question Answering

Simulate Human Saccadic Scan-Paths in Target Searching

Scanpaths Prediction Based on Signals Competition

Visual Scanpath Prediction Using IOR-ROI Recurrent Mixture Density Network.

A combined model for scan path in pedestrian searching

Modeling Programmer Attention as Scanpath Prediction

Representative Scanpath Identification for Group Viewing Pattern Analysis

Gaze-based Human Intention Prediction in the Hybrid Foraging Search Task

Predicting Human Saccadic Scanpaths Based on Iterative Representation Learning

Beyond Average: Individualized Visual Scanpath Prediction

GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths