Abstract:Land use and land cover maps provide fundamental information that has been used in different types of studies, ranging from public health to carbon cycling. However, the existing remote sensing image classification methods thus far suffer from the insufficient usage of multiple modalities, underconsideration of prior domain knowledge, and poor performance on minority classes. To alleviate these problems, we propose a novel domain knowledge-guided deep collaborative fusion network (DKDFN) with performance boosting for minority categories for land cover classification. More specifically, the DKDFN adopts a multihead encoder and a multibranch decoder structure. The architecture of the encoder probablizes sufficient mining of complementary information from multiple modalities, which are Sentinel-2, Sentinel-1, and SRTM Digital Elevation Data (SRTM) in our case. The multibranch decoder enables land cover classification in a multitask learning setup, performing semantic segmentation and reconstructing multimodal remote sensing indices, which are selected as representatives of domain knowledge. This design incorporates domain knowledge in an effective end-to-end manner. The training stage of our DKDFN is supervised by our proposed asymmetry loss function (ALF), which boosts performance on nearly all categories, especially the categories with a low frequency of occurrence. Ablation studies of the network suggest that our design logic is worth testing in any network with an encoder-decoder structure. The study is conducted in Hunan, China and is verified using a self-labeled multimodal unitemporal remote sensing image dataset. The comparative experiments between DKDFN and 6 state-of-the-art models (U-Net, SegNet, PSPNet, DeepLab, HRNet, MP-ResNet) testify to the superiority of our method and suggest its potential to be applied more widely to map land cover in other geographical areas given the availability of Sentinel-2, Sentinel-1, and SRTM data. The dataset can be downloaded by https://github.com/LauraChow/HunanMultimodalDataset.

Robust Land Cover Classification with Multimodal Knowledge Distillation

Multiscale 3-D-2-D Mixed CNN and Lightweight Attention-Free Transformer for Hyperspectral and LiDAR Classification

Multimodal Online Knowledge Distillation Framework for Land Use/Cover Classification Using Full or Missing Modalities

Dense Adaptive Grouping Distillation Network for Multimodal Land Cover Classification with Privileged Modality

More Diverse Means Better: Multimodal Deep Learning Meets Remote Sensing Imagery Classification

Uni-to-Multi Modal Knowledge Distillation for Bidirectional LiDAR-Camera Semantic Segmentation

Multispectral Scene Classification via Cross-Modal Knowledge Distillation

Multisensor Land Cover Classification With Sparsely Annotated Data Based on Convolutional Neural Networks and Self-Distillation

CMR-net: A cross modality reconstruction network for multi-modality remote sensing classification

DKDFN: Domain Knowledge-Guided deep collaborative fusion network for multimodal unitemporal remote sensing land cover classification

Dual-Branch Feature Fusion Network Based Cross-Modal Enhanced CNN and Transformer for Hyperspectral and LiDAR Classification

Convolutional Neural Networks for Multimodal Remote Sensing Data Classification

Multimodal Semantic Collaborative Classification for Hyperspectral Images and LiDAR Data

Multimodal Semantic Consistency-Based Fusion Architecture Search for Land Cover Classification

Instance-Level Scaling and Dynamic Margin-Alignment Knowledge Distillation for Remote Sensing Image Scene Classification

Land cover classification from remote sensing images based on multi-scale fully convolutional network

CMDFusion: Bidirectional Fusion Network with Cross-modality Knowledge Distillation for LIDAR Semantic Segmentation

Multimodal Bilinear Fusion Network with Second-Order Attention-Based Channel Selection for Land Cover Classification

Assisted learning for land use classification: The important role of semantic correlation between heterogeneous images

Pair-Wise Similarity Knowledge Distillation for RSI Scene Classification

Learning rich multimodal representation for robust land cover classification in fog