Abstract:Accurately classifying sceneries with different spatial configurations is an indispensable technique in computer vision and intelligent systems, for example, scene parsing, robot motion planning, and autonomous driving. Remarkable performance has been achieved by the deep recognition models in the past decade. As far as we know, however, these deep architectures are incapable of explicitly encoding the human visual perception, that is, the sequence of gaze movements and the subsequent cognitive processes. In this article, a biologically inspired deep model is proposed for scene classification, where the human gaze behaviors are robustly discovered and represented by a unified deep active learning (UDAL) framework. More specifically, to characterize objects' components with varied sizes, an objectness measure is employed to decompose each scenery into a set of semantically aware object patches. To represent each region at a low level, a local-global feature fusion scheme is developed which optimally integrates multimodal features by automatically calculating each feature's weight. To mimic the human visual perception of various sceneries, we develop the UDAL that hierarchically represents the human gaze behavior by recognizing semantically important regions within the scenery. Importantly, UDAL combines the semantically salient region detection and the deep gaze shifting path (GSP) representation learning into a principled framework, where only the partial semantic tags are required. Meanwhile, by incorporating the sparsity penalty, the contaminated/redundant low-level regional features can be intelligently avoided. Finally, the learned deep GSP features from the entire scene images are integrated to form an image kernel machine, which is subsequently fed into a kernel SVM to classify different sceneries. Experimental evaluations on six well-known scenery sets (including remote sensing images) have shown the competitiveness of our approach.

Perceptually Learning Multi-View Sparse Representation for Scene Categorization.

Multi-scale sparse representation based scene categorization

Multi-View Stereo Representation Revist: Region-Aware MVSNet

Scene Categorization by Deeply Learning Gaze Behavior in a Semisupervised Context

Perceptual Visual Feature Learning With Applications in Sports Educational Image Understanding

Scene categorization via supervised subspace modeling and sparse representation

Aggregation-Based Perceptual Deep Model for Scenic Image Recognition

Semantic Reconstruction based on RGB Image and Sparse Depth

Monocular Visual Place Recognition in LiDAR Maps via Cross-Modal State Space Model and Multi-View Matching

Knowledge-enhanced Multi-perspective Video Representation Learning for Scene Recognition

Double-Keggin-type Anion-templated Synthesis of a 3D Porous Coordination Polymer of Eu(III) Ions and dpdo Ligands

Bioinspired Scene Classification by Deep Active Learning With Remote Sensing Applications

Common Subspace Based Low-Rank and Joint Sparse Representation for Multi-view Face Recognition.

ParseMVS: Learning Primitive-aware Surface Representations for Sparse Multi-view Stereopsis

Comprehensive multi-view self-representations for clustering

Semantically consistent multi-view representation learning

Multi-scale locality preserving projection for partial multi-view incomplete multi-label learning

Mcentrist: A Multi-Channel Feature Generation Mechanism for Scene Categorization

Relationship Learning From Multisource Images via Spatial-Spectral Perception Network

SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing