Abstract:Accurately classifying sceneries with different spatial configurations is an indispensable technique in computer vision and intelligent systems, for example, scene parsing, robot motion planning, and autonomous driving. Remarkable performance has been achieved by the deep recognition models in the past decade. As far as we know, however, these deep architectures are incapable of explicitly encoding the human visual perception, that is, the sequence of gaze movements and the subsequent cognitive processes. In this article, a biologically inspired deep model is proposed for scene classification, where the human gaze behaviors are robustly discovered and represented by a unified deep active learning (UDAL) framework. More specifically, to characterize objects' components with varied sizes, an objectness measure is employed to decompose each scenery into a set of semantically aware object patches. To represent each region at a low level, a local-global feature fusion scheme is developed which optimally integrates multimodal features by automatically calculating each feature's weight. To mimic the human visual perception of various sceneries, we develop the UDAL that hierarchically represents the human gaze behavior by recognizing semantically important regions within the scenery. Importantly, UDAL combines the semantically salient region detection and the deep gaze shifting path (GSP) representation learning into a principled framework, where only the partial semantic tags are required. Meanwhile, by incorporating the sparsity penalty, the contaminated/redundant low-level regional features can be intelligently avoided. Finally, the learned deep GSP features from the entire scene images are integrated to form an image kernel machine, which is subsequently fed into a kernel SVM to classify different sceneries. Experimental evaluations on six well-known scenery sets (including remote sensing images) have shown the competitiveness of our approach.

Saliency Prediction with Scene Structural Guidance

A structure-guided approach to the prediction of natural image saliency

Learning Stereoscopic Visual Attention Model for 3d Video

PerimetryNet: A Multiscale Fine Grained Deep Network for Three-Dimensional Eye Gaze Estimation Using Visual Field Analysis

Contour-guided saliency detection with long-range interactions

Spatial-Aware Object-Level Saliency Prediction by Learning Graphlet Hierarchies

Revisiting Video Saliency Prediction in the Deep Learning Era

Predicting human gaze beyond pixels.

Multi-Camera Saliency.

Saliency Guided Contrastive Learning on Scene Images

Transcending Pixels: Boosting Saliency Detection via Scene Understanding from Aerial Imagery

Learning to Predict Salient Faces: A Novel Visual-Audio Saliency Model

What Do Deep Saliency Models Learn about Visual Attention?

Saliency In Crowd

Deep saliency models learn low-, mid-, and high-level features to predict scene attention

A Computational Model for Stereoscopic Visual Saliency Prediction

Revisiting Video Saliency: A Large-scale Benchmark and a New Model

A Deep Spatial Contextual Long-term Recurrent Convolutional Network for Saliency Detection

Saliency Prediction with External Knowledge

Learning to Model Task-Oriented Attention

Bioinspired Scene Classification by Deep Active Learning With Remote Sensing Applications