Abstract:In recent years, deep learning-based multi-source data fusion, e.g., hyperspectral image (HSI) and light detection and ranging (LiDAR) data fusion, has gained significant attention in the field of remote sensing. However, the traditional convolutional neural network fusion techniques always provide poor extraction of discriminative spatial–spectral features from diversified land covers and overlook the correlation and complementarity between different data sources. Furthermore, the mere act of stacking multi-source feature embeddings fails to represent the deep semantic relationships among them. In this paper, we propose a cross attention-based multi-scale convolutional fusion network for HSI-LiDAR joint classification. It contains three major modules: spatial–elevation–spectral convolutional feature extraction module (SESM), cross attention fusion module (CAFM), and classification module. In the SESM, improved multi-scale convolutional blocks are utilized to extract features from HSI and LiDAR to ensure discriminability and comprehensiveness in diversified land cover conditions. Spatial and spectral pseudo-3D convolutions, pointwise convolutions, residual aggregation, one-shot aggregation, and parameter-sharing techniques are implemented in the module. In the CAFM, a self-designed local-global cross attention block is utilized to collect and integrate relationships of the feature embeddings and generate joint semantic representations. In the classification module, average polling, dropout, and linear layers are used to map the fused semantic representations to the final classification results. The experimental evaluations on three public HSI-LiDAR datasets demonstrate the competitiveness of the proposed network in comparison with state-of-the-art methods.

Classification of hyperspectral and LiDAR data by transformer-based enhancement

Multiscale 3-D-2-D Mixed CNN and Lightweight Attention-Free Transformer for Hyperspectral and LiDAR Classification

A Novel Transformer Network with a CNN-Enhanced Cross-Attention Mechanism for Hyperspectral Image Classification

Joint Classification of Hyperspectral Images and LiDAR Data Based on Dual-Branch Transformer

Dual-Branch Feature Fusion Network Based Cross-Modal Enhanced CNN and Transformer for Hyperspectral and LiDAR Classification

A Joint Convolutional Cross ViT Network for Hyperspectral and Light Detection and Ranging Fusion Classification

Cross Attention-Based Multi-Scale Convolutional Fusion Network for Hyperspectral and LiDAR Joint Classification

A multimodal hyper-fusion transformer for remote sensing image classification

Mutually Beneficial Transformer for Multimodal Data Fusion

Joint Classification of Hyperspectral and LiDAR Data Based on Adaptive Gating Mechanism and Learnable Transformer

Hierarchical Attention and Parallel Filter Fusion Network for Multi-Source Data Classification

LiDAR-Guided Cross-Attention Fusion for Hyperspectral Band Selection and Image Classification

Multimodal Fusion Transformer for Remote Sensing Image Classification

Hierarchical Attention and Parallel Filter Fusion Network for Multisource Data Classification

A Hierarchical Coarse–Fine Adaptive Fusion Network for the Joint Classification of Hyperspectral and LiDAR Data

Multimodal Hyperspectral Image Classification via Interconnected Fusion

Multi-layer feature fusion for hyperspectral image classification

Deep Hierarchical Vision Transformer for Hyperspectral and LiDAR Data Classification

Multilevel Attention Dynamic-Scale Network for HSI and LiDAR Data Fusion Classification

Relative Total Variation Structure Analysis-Based Fusion Method for Hyperspectral and LiDAR Data Classification