Abstract:Visual attention is one of the most significant characteristics for selecting and understanding the outside redundancy world. The human vision system cannot process all information simultaneously due to the visual information bottleneck. In order to reduce the redundant input of visual information, the human visual system mainly focuses on dominant parts of scenes. This is commonly known as visual saliency map prediction. This paper proposed a new psychophysical saliency prediction architecture, WECSF, inspired by multi-channel model of visual cortex functioning in humans. The model consists of opponent color channels, wavelet transform, wavelet energy map, and contrast sensitivity function for extracting low-level image features and providing a maximum approximation to the human visual system. The proposed model is evaluated using several datasets, including the MIT1003, MIT300, TORONTO, SID4VAM, and UCF Sports datasets. We also quantitatively and qualitatively compare the saliency prediction performance with that of other state-of-the-art models. Our model achieved strongly stable and better performance with different metrics on natural images, psychophysical synthetic images and dynamic videos. Additionally, we found that Fourier and spectral-inspired saliency prediction models outperformed other state-of-the-art non-neural network and even deep neural network models on psychophysical synthetic images. It can be explained and supported by the Fourier Vision Hypothesis. In the meantime, we suggest that deep neural networks need specific architectures and goals to be able to predict salient performance on psychophysical synthetic images better and more reliably. Finally, the proposed model could be used as a computational model of primate vision system and help us understand mechanism of primate vision system.

A Visual-Attention Model Using Earth Mover's Distance-Based Saliency Measurement and Nonlinear Feature Combination

Learning Stereoscopic Visual Attention Model for 3d Video

Video Saliency Detection via Dynamic Consistent Spatio-Temporal Attention Modelling.

A biologically inspired computational model for image saliency detection.

Visual Attention Model Based on Fussion of Spatiotemporal Features

A Multimodal Saliency Model For Videos With High Audio-Visual Correspondence

Visual Saliency Detection Based on Topographic Independent Component Analysis

Visual attention computational model using gabor decomposition and 2d entropy

Measuring visual saliency by Site Entropy Rate

Visual saliency detection by integrating spatial position prior of object with background cues

A Biological Inspired Features Based Saliency Map

A Visual Saliency Model Based on Haar-Like Features and Structural Complexity

What Do Deep Saliency Models Learn about Visual Attention?

Semantic and Contrast-Aware Saliency

Saliency Detection with Bilateral Absorbing Markov Chain Guided by Depth Information

A Psychophysically Oriented Saliency Map Prediction Model

Learning to Predict Salient Faces: A Novel Visual-Audio Saliency Model

Opponent and Feedback: Visual Attention Captured

SUM: Saliency Unification through Mamba for Visual Attention Modeling

Bio-driven visual saliency detection with color factor

Visual Saliency Detection: an Information Theoretic Algorithm Combined Long-term with Short-term Features