Abstract:Geometric information in the normalized digital surface models (nDSM) is highly correlated with the semantic class of the land cover. Exploiting two modalities (RGB and nDSM (height)) jointly has great potential to improve the segmentation performance. However, it is still an under-explored field in remote sensing due to the following challenges. First, the scales of existing datasets are relatively small and the diversity of existing datasets is limited, which restricts the ability of validation. Second, there is a lack of unified benchmarks for performance assessment, which leads to difficulties in comparing the effectiveness of different models. Last, sophisticated multi-modal semantic segmentation methods have not been deeply explored for remote sensing data. To cope with these challenges, in this paper, we introduce a new remote-sensing benchmark dataset for multi-modal semantic segmentation based on RGB-Height (RGB-H) data. Towards a fair and comprehensive analysis of existing methods, the proposed benchmark consists of 1) a large-scale dataset including co-registered RGB and nDSM pairs and pixel-wise semantic labels; 2) a comprehensive evaluation and analysis of existing multi-modal fusion strategies for both convolutional and Transformer-based networks on remote sensing data. Furthermore, we propose a novel and effective Transformer-based intermediary multi-modal fusion (TIMF) module to improve the semantic segmentation performance through adaptive token-level multi-modal <a class="link-external link-http" href="http://fusion.The" rel="external noopener nofollow">this http URL</a> designed benchmark can foster future research on developing new methods for multi-modal learning on remote sensing data. Extensive analyses of those methods are conducted and valuable insights are provided through the experimental results. Code for the benchmark and baselines can be accessed at \url{<a class="link-external link-https" href="https://github.com/EarthNets/RSI-MMSegmentation" rel="external noopener nofollow">this https URL</a>}.

GeoImageNet: a multi-source natural feature benchmark dataset for GeoAI and supervised machine learning

Multi-modal Feature Fusion for Geographic Image Annotation.

GeoNet: Benchmarking Unsupervised Adaptation across Geographies

EarthNets: Empowering AI in Earth Observation

GEO-Bench: Toward Foundation Models for Earth Monitoring

CMAB: A First National-Scale Multi-Attribute Building Dataset in China Derived from Open Source Data and GeoAI

GAMUS: A Geometry-aware Multi-modal Semantic Segmentation Benchmark for Remote Sensing Data

GeoDE: a Geographically Diverse Evaluation Dataset for Object Recognition

GeoAuxNet: Towards Universal 3D Representation Learning for Multi-sensor Point Clouds

AutoGeo: Automating Geometric Image Dataset Creation for Enhanced Geometry Understanding

IML-Net: A Framework for Cross-View Geo-Localization with Multi-Domain Remote Sensing Data

GeoAI in Social Science

University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization

MULTISENGE: A MULTIMODAL AND MULTITEMPORAL BENCHMARK DATASET FOR LAND USE/LAND COVER REMOTE SENSING APPLICATIONS

Bigearthnet: A Large-Scale Benchmark Archive for Remote Sensing Image Understanding

Two novel benchmark datasets from ArcGIS and bing world imagery for remote sensing image retrieval

GeoNet: Deep Geodesic Networks for Point Cloud Analysis

Game4Loc: A UAV Geo-Localization Benchmark from Game Data

When Visual Grounding Meets Gigapixel-level Large-scale Scenes: Benchmark and Approach

Self-consistent Deep Geometric Learning for Heterogeneous Multi-source Spatial Point Data Prediction