Abstract:Saliency map and object map are the two contrasting hypotheses for the mechanisms utilized by the visual system to guide eye fixations when humans are freely viewing natural images. Most computational studies define saliency as outliers of distributions of low-level features, and propose saliency as an important factor for predicting eye fixations. Psychophysical studies, however, suggest that high-level objects predict eye fixations more accurately and the early saliency only has a minor effect. But this view has been challenged by a study which shows opposite results, suggesting that the role of object-level features needs further investigations. In addition, little is known about the role of intermediate features between the low-level and the object-level features. In this paper, we construct two models based on mid-level and object-level features, respectively, and compare their performances against those based on low-level features. Quantitative evaluation on three benchmark natural image fixation data sets demonstrates that the mid-level model outperforms the state-of-the-art low-level models by a significant margin and the object-level model is inferior to most low-level models. Quantitative evaluation on a video fixation data set demonstrates that both the mid-level and object-level models outperform the state-of-the-art low-level models, and the latter performs better under three out of four standard metrics. When combined together the two proposed models achieve even higher performance. However, incorporating the best low-level model yields negligible improvements on all of the data sets. Taken together, these results indicate that higher level features may be more effective than low-level features for predicting eye fixations on natural images in the free viewing condition.

Visual-Verbal Consistency Of Image Saliency

Efficient Classification Using Salient Regions

SALICON: Saliency in Context.

Human Attention in Image Captioning: Dataset and Analysis

Co-saliency Detection Based on Hierarchical Consistency.

Predicting eye fixations with higher-level visual features.

COSE: A Consistency-Sensitivity Metric for Saliency on Image Classification

What Do Deep Saliency Models Learn about Visual Attention?

Emotional Attention: A Study of Image Sentiment and Visual Attention

Visual saliency and semantic incongruency influence eye movements when inspecting pictures

2 Conditional Saliency 2 . 1 Lossy Coding

A biologically inspired computational model for image saliency detection.

Salient Locations Search Based on Human Visual Attention: an Experimental Analysis

Image Visual Attention Computation and Application Via the Learning of Object Attributes

Semantic and Contrast-Aware Saliency

Low-level and High-Level Prior Learning for Visual Saliency Estimation.

An Object-Oriented Visual Saliency Detection Framework Based on Sparse Coding Representations

Rethinking of the Image Salient Object Detection: Object-level Semantic Saliency Re-ranking First, Pixel-wise Saliency Refinement Latter

A Study on Interest Point Guided Visual Saliency.

Aligning Where to See and What to Tell: Image Caption with Region-Based Attention and Scene Factorization

Top-down Visual Saliency Guided by Captions