Abstract:The rapid increase in spatial resolution of remote sensing scene images (RSIs) has led to a concomitant increase in the complexity of the spatial contextual information contained therein. The coexistence of numerous smaller features makes it challenging to accurately locate and mine these features, which in turn makes accurate interpretation difficult. In order to address the aforementioned issues, this article proposes a dynamic convolution covariance network (ODFMN) based on omni-dimensional dynamic convolution, which can extract multidimensional and multiscale features from RSIs and perform statistical higher-order representation of feature information. First, in order to fully exploit the complex spatial context information of RSIs and at the same time improve the limitation of a single static convolution kernel for feature extraction, we constructed a omni-dimensional feature extraction module based on dynamic convolution, which fully extracts the 4-D information within the convolution kernel. Then, to make full use of the full-dimensional feature information extracted from each level in the network, the feature representation is enriched by constructing multiscale feature fusion module to establish relationships from local to global. Finally, higher order statistical information is employed to address the challenge of representing first-order information for smaller object features, which is inherently difficult to do. Experiments conducted on publicly available datasets have demonstrated that the method achieves high classification accuracies of 99.04%, 95.34%, and 92.50%, respectively. Furthermore, the method has been verified to have high capture accuracy for feature target contours, shapes, and spatial context information through feature visualization.

Dynamic Texture and Scene Classification by Transferring Deep Image Features

Dynamic Convolution Covariance Network Using Multi-Scale Feature Fusion for Remote Sensing Scene Image Classification

Dynamic Spatio-Temporal Feature Learning via Graph Convolution in 3D Convolutional Networks

Convolutional Neural Network on Three Orthogonal Planes for Dynamic Texture Classification

Stacked Convolutional Deep Encoding Network for Video-Text Retrieval.

Dynamic information enhancement for video classification

Cross-domain Residual Deep NMF for Transfer Learning Between Different Hyperspectral Image Scenes.

Deep Structure-Revealed Network for Texture Recognition.

Dynamic texture recognition with video set based collaborative representation

MSSTNet: A Multi-Scale Spatio-Temporal CNN-Transformer Network for Dynamic Facial Expression Recognition

Transfer Classification for Distinct Manifestations with Shared Information

Dynamic Convolution Covariance Network Using Multiscale Feature Fusion for Remote Sensing Scene Image Classification

Texture-guided Coding for Deep Features

HDTFF-Net: Hierarchical Deep Texture Features Fusion Network for High-resolution Remote Sensing Scene Classification

Dynamic Texture Transfer using PatchMatch and Transformers

Video-to-Image Casting: A Flatting Method for Video Analysis.

Deep Unsupervised Key Frame Extraction for Efficient Video Classification

Deep Multiple-Attribute-Perceived Network For Real-World Texture Recognition

Learning Deep Intrinsic Video Representation by Exploring Temporal Coherence and Graph Structure

Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification

Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning