Abstract:Key Points We review recent work on computational models of focal visual attention, with emphasis on the bottom-up, saliency- or image-based control of attentional deployment. We highlight five important trends that have emerged from the computational literature: First, the perceptual saliency of stimuli critically depends on surrounding context; that is, the same object may or may not appear salient depending on the nature and arrangement of other objects in the scene. Computationally, this means that contextual influences, such as non-classical surround interactions, must be included in models. Second, a unique 'saliency map' topographically encoding for stimulus conspicuity over the visual scene has proved to be an efficient and plausible bottom-up control strategy. Many successful models are based on such architecture, and electrophysiological as well as psychophysical studies have recently supported the idea that saliency is explicitly encoded in the brain. Third, inhibition-of-return (IOR), the process by which the currently attended location is transiently inhibited, is a critical element of attentional deployment. Without IOR, attention would endlessly be attracted towards the most salient stimulus. IOR thus implements a memory of recently visited locations, and allows attention to thoroughly scan our visual environment. Fourth, attention and eye movements tightly interplay, posing computational challenges with respect to the coordinate system used to control attention. Understanding the interaction between overt and covert attention is particularly important for models concerned with visual search. Last, scene understanding and object recognition strongly constrain the selection of attended locations. Although several models have approached, in an information-theoretical sense, the problem of optimally deploying attention to analyse a scene, biologically plausible implementations of such a computational strategy remain to be developed.

Learning to Model Task-Oriented Attention

Learning Stereoscopic Visual Attention Model for 3d Video

Learning visual saliency based on object's relative relationship

Salient Locations Search Based on Human Visual Attention: an Experimental Analysis

Image Visual Attention Computation and Application Via the Learning of Object Attributes

What Do Deep Saliency Models Learn about Visual Attention?

Predicting human gaze beyond pixels.

Learning Attention Map from Images

Learning high-level concepts by training a deep network on eye fixations

Weakly Supervised Visual Saliency Prediction

Spatial-Aware Object-Level Saliency Prediction by Learning Graphlet Hierarchies

On Semantic-Instructed Attention: from Video Eye-Tracking Dataset to Memory-Guided Probabilistic Saliency Model.

A Deep Model of Visual Attention for Saliency Detection on 3D Objects

Computational modelling of visual attention

Deep saliency models learn low-, mid-, and high-level features to predict scene attention

Opponent and Feedback: Visual Attention Captured

Low-level and High-Level Prior Learning for Visual Saliency Estimation.

Inferring Salient Objects from Human Fixations

A structure-guided approach to the prediction of natural image saliency

Video Saliency Detection via Dynamic Consistent Spatio-Temporal Attention Modelling.

Learning to Detect A Salient Object