Abstract:Visual scene recognition is an indispensable part of automatic localization and navigation. In the same scene, the appearance and viewpoint may be changed greatly, which is the largest challenge for some advanced unmanned systems,e.g. robot,vehicle and UAV,etc., to identify scenes where they have visited. Traditional methods have been subjected to hand-made feature-based paradigms for a long time, mainly relying on the prior knowledge of the designer, and are not sufficiently robust to extreme changing scenes. In this paper, we cope with scene recognition with automatically learning the representation of features from big image samples. Firstly, we propose a novel approach for scene recognition via training a slight-weight convolutional neural network (CNN) that overall has less complex and more efficient network architecture, and is trainable in the manner of end-to-end. The proposed approach uses the deep-leaning features of self-selection combining with light CNN process to perform high semantic understanding of visual scenes. Secondly, we propose to employ a salient region-based technology to extract the local feature representation of a specific scene region directly from the convolution layer based on self-selection mechanism, and each layer performs a linear operation with end-to-end manner. Furthermore, we also utilize probability statistics to calculate the total similarity of several regions in one scene to other regions, and finally rank the similarity scores to select the correct scene. We have conducted a lot of experiments to evaluate the results of performance by comparing four methods (namely, our proposed and other three well known and advanced methods). Experimental results show that the proposed method is more robust and accurate than other three well-known methods in extremely harsh environments (e. g. weak light and strong blur).

Weakly Supervised PatchNets : Learning Aggregated Patch Descriptors for Scene Recognition

Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition

A lightweight and stochastic depth residual attention network for remote sensing scene classification

LPNet: A Remote Sensing Scene Classification Method Based on Large Kernel Convolution and Parameter Fusion

PatchNet: Maximize the Exploration of Congeneric Semantics for Weakly Supervised Semantic Segmentation

Learning Local Event-based Descriptor for Patch-based Stereo Matching

Weakly Supervised Semantic Segmentation via Progressive Patch Learning

PADENet: an Efficient and Robust Panoramic Monocular Depth Estimation Network for Outdoor Scenes.

Locally Supervised Deep Hybrid Model for Scene Recognition

Scene Recognition Using Deep Softpool Capsule Network Based on Residual Diverse Branch Block

Self-Selection Salient Region-Based Scene Recognition Using Slight-Weight Convolutional Neural Network

Learning Discriminative Visual Dictionary for Natural Scene Categorization

APC: Adaptive Patch Contrast for Weakly Supervised Semantic Segmentation

Learning local descriptors with multi-level feature aggregation and spatial context pyramid

Convolutional Patch Representations for Image Retrieval: an Unsupervised Approach

A holistic representation guided attention network for scene text recognition

Spatial Information Considered Network for Scene Classification

Weakly supervised segmentation via instance-aware propagation

An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment

Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification

Cross-Patch Relation Enhanced for Weakly Supervised Semantic Segmentation