Abstract:In recent years, deep learning-based multi-source data fusion, e.g., hyperspectral image (HSI) and light detection and ranging (LiDAR) data fusion, has gained significant attention in the field of remote sensing. However, the traditional convolutional neural network fusion techniques always provide poor extraction of discriminative spatial–spectral features from diversified land covers and overlook the correlation and complementarity between different data sources. Furthermore, the mere act of stacking multi-source feature embeddings fails to represent the deep semantic relationships among them. In this paper, we propose a cross attention-based multi-scale convolutional fusion network for HSI-LiDAR joint classification. It contains three major modules: spatial–elevation–spectral convolutional feature extraction module (SESM), cross attention fusion module (CAFM), and classification module. In the SESM, improved multi-scale convolutional blocks are utilized to extract features from HSI and LiDAR to ensure discriminability and comprehensiveness in diversified land cover conditions. Spatial and spectral pseudo-3D convolutions, pointwise convolutions, residual aggregation, one-shot aggregation, and parameter-sharing techniques are implemented in the module. In the CAFM, a self-designed local-global cross attention block is utilized to collect and integrate relationships of the feature embeddings and generate joint semantic representations. In the classification module, average polling, dropout, and linear layers are used to map the fused semantic representations to the final classification results. The experimental evaluations on three public HSI-LiDAR datasets demonstrate the competitiveness of the proposed network in comparison with state-of-the-art methods.

A Joint Convolutional Cross ViT Network for Hyperspectral and Light Detection and Ranging Fusion Classification

Multiscale 3-D-2-D Mixed CNN and Lightweight Attention-Free Transformer for Hyperspectral and LiDAR Classification

A Novel Transformer Network with a CNN-Enhanced Cross-Attention Mechanism for Hyperspectral Image Classification

Cross Attention-Based Multi-Scale Convolutional Fusion Network for Hyperspectral and LiDAR Joint Classification

Aerial Scene Classification Via Multilevel Fusion Based on Deep Convolutional Neural Networks.

Information Fusion for Classification of Hyperspectral and LiDAR Data Using IP-CNN

Joint Classification of Hyperspectral Images and LiDAR Data Based on Dual-Branch Transformer

Dual-Branch Feature Fusion Network Based Cross-Modal Enhanced CNN and Transformer for Hyperspectral and LiDAR Classification

Bridging CNN and Transformer With Cross-Attention Fusion Network for Hyperspectral Image Classification

Hybrid FusionNet: A Hybrid Feature Fusion Framework for Multisource High-Resolution Remote Sensing Image Classification

Classification of hyperspectral and LiDAR data by transformer-based enhancement

Multi-layer feature fusion for hyperspectral image classification

Hybrid Conv-ViT Network for Hyperspectral Image Classification

Classification of Hyperspectral and LiDAR Data Using Coupled CNNs

Multimodal Fusion Transformer for Remote Sensing Image Classification

Heterogeneous feature learning network for multimodal remote sensing image collaborative classification

A Hierarchical Coarse–Fine Adaptive Fusion Network for the Joint Classification of Hyperspectral and LiDAR Data

An Efficient Cross-Modality Self-Calibrated Network for Hyperspectral and Multispectral Image Fusion

A multimodal hyper-fusion transformer for remote sensing image classification

Joint Classification of Hyperspectral and LiDAR Data Using Height Information Guided Hierarchical Fusion-and-Separation Network