Abstract:Accurately classifying sceneries with different spatial configurations is an indispensable technique in computer vision and intelligent systems, for example, scene parsing, robot motion planning, and autonomous driving. Remarkable performance has been achieved by the deep recognition models in the past decade. As far as we know, however, these deep architectures are incapable of explicitly encoding the human visual perception, that is, the sequence of gaze movements and the subsequent cognitive processes. In this article, a biologically inspired deep model is proposed for scene classification, where the human gaze behaviors are robustly discovered and represented by a unified deep active learning (UDAL) framework. More specifically, to characterize objects' components with varied sizes, an objectness measure is employed to decompose each scenery into a set of semantically aware object patches. To represent each region at a low level, a local-global feature fusion scheme is developed which optimally integrates multimodal features by automatically calculating each feature's weight. To mimic the human visual perception of various sceneries, we develop the UDAL that hierarchically represents the human gaze behavior by recognizing semantically important regions within the scenery. Importantly, UDAL combines the semantically salient region detection and the deep gaze shifting path (GSP) representation learning into a principled framework, where only the partial semantic tags are required. Meanwhile, by incorporating the sparsity penalty, the contaminated/redundant low-level regional features can be intelligently avoided. Finally, the learned deep GSP features from the entire scene images are integrated to form an image kernel machine, which is subsequently fed into a kernel SVM to classify different sceneries. Experimental evaluations on six well-known scenery sets (including remote sensing images) have shown the competitiveness of our approach.

Learning Scene-Specific Object Detectors Based on a Generative-Discriminative Model with Minimal Supervision

Autonomous Learning Framework Based on Online Hybrid Classifier for Multi-view Object Detection in Video.

Unsupervised Object Detectionwith Scene-Adaptive Concept Learning

Unsupervised Domain-Adaptive Scene-Specific Pedestrian Detection for Static Video Surveillance

Adaptive Deep Convolutional Neural Networks for Scene-Specific Object Detection

Learning Complex Background Based on Multi-scale Discriminative Model

Object Detection from Scratch with Deep Supervision

DSOD: Learning Deeply Supervised Object Detectors from Scratch

Automatic Weakly Supervised Object Detection from High Spatial Resolution Remote Sensing Images Via Dynamic Curriculum Learning

Small Object Detection by Generative and Discriminative Learning.

Exploiting Target Data to Learn Deep Convolutional Networks for Scene-Adapted Human Detection

General Geometry-aware Weakly Supervised 3D Object Detection

Unsupervised inner-point-pairs model for unseen-scene and online moving object detection

Generative Adversarial Learning Towards Fast Weakly Supervised Detection

Multilevel Unsupervised Domain Adaptation for Single-Stage Object Detection in Remote Sensing Images

Self-Training for Domain Adaptive Scene Text Detection

Learning Object Detectors from Scratch with Gated Recurrent Feature Pyramids.

Single-Domain Generalized Object Detection in Urban Scene Via Cyclic-Disentangled Self-Distillation

Learning Dynamic Scene-Conditioned 3D Object Detectors

Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene

Bioinspired Scene Classification by Deep Active Learning With Remote Sensing Applications