Abstract:The advancement in satellite image sensors has enabled the acquisition of high-resolution remote sensing (HRRS) images. However, interpreting these images accurately and obtaining the computational power needed to do so is challenging due to the complexity involved. This manuscript proposed a multi-stream convolutional neural network (CNN) fusion framework that involves multi-scale and multi-CNN integration for HRRS image recognition. The pre-trained CNNs were used to learn and extract semantic features from multi-scale HRRS images. Feature extraction using pre-trained CNNs is more efficient than training a CNN from scratch or fine-tuning a CNN. Discriminative canonical correlation analysis (DCCA) was used to fuse deep features extracted across CNNs and image scales. DCCA reduced the dimension of the features extracted from CNNs while providing a discriminative representation by maximizing the within-class correlation and minimizing the between-class correlation. The proposed model has been evaluated on NWPU-RESISC45 and UC Merced datasets. The accuracy associated with DCCA was 10% and 6% higher than discriminant correlation analysis (DCA) in the NWPU-RESISC45 and UC Merced datasets. The advantage of DCCA was better demonstrated in the NWPU-RESISC45 dataset due to the incorporation of richer within-class variability in this dataset. While both DCA and DCCA minimize between-class correlation, only DCCA maximizes the within-class correlation and, therefore, attains better accuracy. The proposed framework achieved higher accuracy than all state-of-the-art frameworks involving unsupervised learning and pre-trained CNNs and 2–3% higher than the majority of fine-tuned CNNs. The proposed framework offers computational time advantages, requiring only 13 s for training in NWPU-RESISC45, compared to a day for fine-tuning the existing CNNs. Thus, the proposed framework achieves a favourable balance between efficiency and accuracy in HRRS image recognition.

Scene Understanding Based on Multi-Scale Pooling of Deep Learning Features

Dynamic Convolution Covariance Network Using Multi-Scale Feature Fusion for Remote Sensing Scene Image Classification

Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation

Remote Sensing Scene Image Classification Model Based on Multi-Scale Features and Attention Mechanism

Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

Performance Comparison Of Two Pooling Strategies For Remote Sensing Image Scene Classification

Multi-scale Matching Networks for Semantic Correspondence

Dynamic Parallel Pyramid Networks for Scene Recognition

Scene Recognition Based on Feature Learning from Multi-Scale Salient Regions

CMPF-UNet: a ConvNeXt multi-scale pyramid fusion U-shaped network for multi-category segmentation of remote sensing images

Scene recognition with CNNs: objects, scales and dataset bias

Effective Multiscale Residual Network With High-Order Feature Representation for Optical Remote Sensing Scene Classification

Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution Cnns

A Remote-Sensing Scene-Image Classification Method Based on Deep Multiple-Instance Learning with a Residual Dense Attention ConvNet

Long Range Pooling for 3D Large-Scale Scene Understanding

Multi-Scale and Multi-Network Deep Feature Fusion for Discriminative Scene Classification of High-Resolution Remote Sensing Images

Scene Recognition Using Deep Softpool Capsule Network Based on Residual Diverse Branch Block

Scene Classification Of High Resolution Remote Sensing Images Using Convolutional Neural Networks

An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment

Remote Sensing Scene Classification Based on Multi-Structure Deep Features Fusion

DeepScene: Scene classification via convolutional neural network with spatial pyramid pooling