Abstract:The rapid increase in spatial resolution of remote sensing scene images (RSIs) has led to a concomitant increase in the complexity of the spatial contextual information contained therein. The coexistence of numerous smaller features makes it challenging to accurately locate and mine these features, which in turn makes accurate interpretation difficult. In order to address the aforementioned issues, this article proposes a dynamic convolution covariance network (ODFMN) based on omni-dimensional dynamic convolution, which can extract multidimensional and multiscale features from RSIs and perform statistical higher-order representation of feature information. First, in order to fully exploit the complex spatial context information of RSIs and at the same time improve the limitation of a single static convolution kernel for feature extraction, we constructed a omni-dimensional feature extraction module based on dynamic convolution, which fully extracts the 4-D information within the convolution kernel. Then, to make full use of the full-dimensional feature information extracted from each level in the network, the feature representation is enriched by constructing multiscale feature fusion module to establish relationships from local to global. Finally, higher order statistical information is employed to address the challenge of representing first-order information for smaller object features, which is inherently difficult to do. Experiments conducted on publicly available datasets have demonstrated that the method achieves high classification accuracies of 99.04%, 95.34%, and 92.50%, respectively. Furthermore, the method has been verified to have high capture accuracy for feature target contours, shapes, and spatial context information through feature visualization.

Denoising-Based Multiscale Feature Fusion for Remote Sensing Image Captioning.

Dynamic Convolution Covariance Network Using Multi-Scale Feature Fusion for Remote Sensing Scene Image Classification

Multi-label Semantic Feature Fusion for Remote Sensing Image Captioning

DFEN: Dual Feature Enhancement Network for Remote Sensing Image Caption

TSFE: Two-Stage Feature Enhancement for Remote Sensing Image Captioning

Multi-Scale Cropping Mechanism for Remote Sensing Image Captioning

Remote Sensing Image Captioning Based on Multi-Level Feature Extraction and Adaptive Attention

Remote Sensing Image Captioning with Sequential Attention and Flexible Word Correlation

Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning

MfrNet: A New Multi-Scale Feature Refining Method for Remote Sensing Image Change Captioning

Remote Sensing Image Captioning with Multi-Scale Feature and Small Target Attention

Multi-View Feature Fusion and Visual Prompt for Remote Sensing Image Captioning

Multiscale Methods for Optical Remote-Sensing Image Captioning

M-FFN: multi-scale feature fusion network for image captioning

Fine-Grained Features for Image Captioning

Dual attention deep fusion semantic segmentation networks of large-scale satellite remote-sensing images

Remote Sensing Image Semantic Segmentation Network Based on Multi-Scale Feature Enhancement Fusion

A Denoising Framework for Image Caption.

Multi-channel Weighted Fusion for Image Captioning

Multi-Attention Fusion and Fine-Grained Alignment for Bidirectional Image-Sentence Retrieval in Remote Sensing

Remote Sensing Image Semantic Segmentation Method Based on a Deep Convolutional Neural Network and Multiscale Feature Fusion