Abstract:A visual scanpath represents the human eye movements when scanning the visual field for acquiring and receiving visual information. Predicting visual scanpaths when a certain stimulus is presented plays an important role in modeling overt human visual attention and search behavior. In this paper, we presented an 'Inhibition of Return - Region of Interest' (IOR-ROI) recurrent mixture density network based framework learning to produce human-like visual scanpaths under task-free viewing conditions. The proposed model simultaneously predicts a sequence of ordered fixation positions and their corresponding fixation durations. Our model integrates bottom-up features and semantic features extracted by convolutional neural networks. Then the integrated feature maps are fed into the IOR-ROI Long Short-Term Memory (LSTM) which is the core component of the proposed model. The IOR-ROI LSTM is a dual LSTM unit, i.e., the IOR-LSTM and the ROI-LSTM, capturing IOR dynamics and gaze shift behavior simultaneously. IOR-LSTM simulates the visual working memory to adaptively maintain and update visual information regarding previously fixated regions. ROI-LSTM is responsible for predicting the next possible ROIs given the spatially inhibited image feature maps on the feature-wise basis. Fixation duration is predicted by a regression neural network given the viewing history and image feature maps corresponding to currently fixated ROI. Considering the eye movement pattern variations among subjects, a mixture density network is adopted to model the next fixation distribution as Gaussian mixtures and the fixation duration is also modeled using Gaussian distribution. Our model is evaluated on the OSIE and MIT low resolution eye-tracking datasets and experimental results indicate that the proposed method can achieve superior performance in predicting visual scanpaths. The code will be publicly available on URL: https://github.com/sunwj/scanpath.

Scanpaths Generation for Target Search Based on Deep Learning

Imitating the Human Visual System for Scanpath Predicting.

Human Scanpath Prediction Based on Deep Convolutional Saccadic Model

Human Scanpath Estimation Based on Semantic Segmentation Guided by Common Eye Fixation Behaviors

Scanpath Prediction Based On High-Level Features And Memory Bias

Scanpath Prediction Via Semantic Representation of the Scene

A combined model for scan path in pedestrian searching

Scanpaths Prediction Based on Signals Competition

Scanpath Mining of Eye Movement Trajectories for Visual Attention Analysis

EyeFormer: Predicting Personalized Scanpaths with Transformer-Guided Reinforcement Learning

Predicting Human Scanpaths in Visual Question Answering

GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths

Simulate Human Saccadic Scan-Paths in Target Searching

Gaze-based Human Intention Prediction in the Hybrid Foraging Search Task

Visual Scanpath Prediction Using IOR-ROI Recurrent Mixture Density Network.

OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction

Improved Scan Path Model Using Visual Saliency Dataset

Predicting Human Saccadic Scanpaths Based on Iterative Representation Learning

Scanpath Prediction for Visual Attention Using IOR-ROI LSTM.

Modeling Programmer Attention as Scanpath Prediction

Unified Dynamic Scanpath Predictors Outperform Individually Trained Neural Models