Abstract:Scene understanding of remote sensing images is of great significance in various applications. Its fundamental problem is how to construct representative features. Various convolutional neural network architectures have been proposed for automatically learning features from images. However, is the current way of configuring the same architecture to learn all the data while ignoring the differences between images the right one? It seems to be contrary to our intuition: it is clear that some images are easier to recognize, and some are harder to recognize. This problem is the gap between the characteristics of the images and the learning features corresponding to specific network structures. Unfortunately, the literature so far lacks an analysis of the two. In this paper, we explore this problem from three aspects: we first build a visual-based evaluation pipeline of scene complexity to characterize the intrinsic differences between images; then, we analyze the relationship between semantic concepts and feature representations, i.e., the scalability and hierarchy of features which the essential elements in CNNs of different architectures, for remote sensing scenes of different complexity; thirdly, we introduce CAM, a visualization method that explains feature learning within neural networks, to analyze the relationship between scenes with different complexity and semantic feature representations. The experimental results show that a complex scene would need deeper and multi-scale features, whereas a simpler scene would need lower and single-scale features. Besides, the complex scene concept is more dependent on the joint semantic representation of multiple objects. Furthermore, we propose the framework of scene complexity prediction for an image and utilize it to design a depth and scale-adaptive model. It achieves higher performance but with fewer parameters than the original model, demonstrating the potential significance of scene complexity.

Understanding Visual Feature Reliance through the Lens of Complexity

Understanding Visual Feature Reliance through the Lens of Complexity

Visual Complexity of Shapes: a Hierarchical Perceptual Learning Model

Visualizing and Understanding Neural Models in NLP

Simplicity in Complexity : Explaining Visual Complexity using Deep Segmentation Models

Quantitative Characterization of Semantic Gaps for Learning Complexity Estimation and Inference Model Selection

Complexity Matters: Dynamics of Feature Learning in the Presence of Spurious Correlations

Measures of Complexity for Large Scale Image Datasets

Vision at A Glance: Interplay between Fine and Coarse Information Processing Pathways

Scene Complexity: A New Perspective on Understanding the Scene Semantics of Remote Sensing and Designing Image-Adaptive Convolutional Neural Networks

Understanding Neural Networks Through Deep Visualization

CLIC: Contrastive Learning Framework for Unsupervised Image Complexity Representation

Learned feature representations are biased by complexity, learning order, position, and more

Depth and Representation in Vision Models

Fusing Multiple Visual Features for Image Complexity Evaluation

A Measure of the Complexity of Neural Representations based on Partial Information Decomposition

Simplicity Bias of Two-Layer Networks beyond Linearly Separable Data

Towards Complex Features: Competitive Receptive Fields In Unsupervised Deep Networks

Exploring Features and Attributes in Deep Face Recognition Using Visualization Techniques

Deeper Interpretability of Deep Networks

Understanding, Analyzing, and Optimizing the Complexity of Deep Models