Abstract:Land use and land cover maps provide fundamental information that has been used in different types of studies, ranging from public health to carbon cycling. However, the existing remote sensing image classification methods thus far suffer from the insufficient usage of multiple modalities, underconsideration of prior domain knowledge, and poor performance on minority classes. To alleviate these problems, we propose a novel domain knowledge-guided deep collaborative fusion network (DKDFN) with performance boosting for minority categories for land cover classification. More specifically, the DKDFN adopts a multihead encoder and a multibranch decoder structure. The architecture of the encoder probablizes sufficient mining of complementary information from multiple modalities, which are Sentinel-2, Sentinel-1, and SRTM Digital Elevation Data (SRTM) in our case. The multibranch decoder enables land cover classification in a multitask learning setup, performing semantic segmentation and reconstructing multimodal remote sensing indices, which are selected as representatives of domain knowledge. This design incorporates domain knowledge in an effective end-to-end manner. The training stage of our DKDFN is supervised by our proposed asymmetry loss function (ALF), which boosts performance on nearly all categories, especially the categories with a low frequency of occurrence. Ablation studies of the network suggest that our design logic is worth testing in any network with an encoder-decoder structure. The study is conducted in Hunan, China and is verified using a self-labeled multimodal unitemporal remote sensing image dataset. The comparative experiments between DKDFN and 6 state-of-the-art models (U-Net, SegNet, PSPNet, DeepLab, HRNet, MP-ResNet) testify to the superiority of our method and suggest its potential to be applied more widely to map land cover in other geographical areas given the availability of Sentinel-2, Sentinel-1, and SRTM data. The dataset can be downloaded by https://github.com/LauraChow/HunanMultimodalDataset.

Learning rich multimodal representation for robust land cover classification in fog

Semantic Representation Fusion-Based Network for Robust Land Cover Classification in Foggy Conditions.

MFFnet: Multimodal Feature Fusion Network for Synthetic Aperture Radar and Optical Image Land Cover Classification

Multimodal Semantic Consistency-Based Fusion Architecture Search for Land Cover Classification

Deep Multimodal Fusion Network for Semantic Segmentation Using Remote Sensing Image and LiDAR Data

Deep Feature Fusion for High-Resolution Aerial Scene Classification

CMSE: Cross-Modal Semantic Enhancement Network for Classification of Hyperspectral and LiDAR Data

LMFNet: An Efficient Multimodal Fusion Approach for Semantic Segmentation in High-Resolution Remote Sensing

More Diverse Means Better: Multimodal Deep Learning Meets Remote Sensing Imagery Classification

Collaborative Attention-Based Heterogeneous Gated Fusion Network for Land Cover Classification

Learning SAR-Optical Cross Modal Features for Land Cover Classification

Multimodal Bilinear Fusion Network with Second-Order Attention-Based Channel Selection for Land Cover Classification

DKDFN: Domain Knowledge-Guided deep collaborative fusion network for multimodal unitemporal remote sensing land cover classification

Alignment and Fusion Using Distinct Sensor Data for Multimodal Aerial Scene Classification

A Hierarchical Coarse–Fine Adaptive Fusion Network for the Joint Classification of Hyperspectral and LiDAR Data

Robust 3D Semantic Segmentation Method Based on Multi-Modal Collaborative Learning

A Crossmodal Multiscale Fusion Network for Semantic Segmentation of Remote Sensing Data

Robust-FusionNet: Deep Multimodal Sensor Fusion for 3-D Object Detection Under Severe Weather Conditions

CIMFNet: Cross-layer Interaction and Multiscale Fusion Network for Semantic Segmentation of High-Resolution Remote Sensing Images

Reinforcement Learning Based Markov Edge Decoupled Fusion Network for Fusion Classification of Hyperspectral and LiDAR

Learning transferable cross-modality representations for few-shot hyperspectral and LiDAR collaborative classification