Multiscale 3-D-2-D Mixed CNN and Lightweight Attention-Free Transformer for Hyperspectral and LiDAR Classification
Le Sun,Xinyu Wang,Yuhui Zheng,Zebin Wu,Liyong Fu
DOI: https://doi.org/10.1109/tgrs.2024.3367374
IF: 8.2
2024-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:The effective combination of hyperspectral image (HSI) and light detection and ranging (LiDAR) data can be used for land cover classification. Recently, deep-learning-based classification methods, especially those using transformer networks, have achieved remarkable success. However, deep learning classification methods for multisource data still encounter various technical challenges, such as comprehensive utilization of multiscale information, lightweight network design, and efficient fusion strategies for heterogeneous data. To address these challenges, we propose a novel and efficient deep neural network, namely, multiscale 3-D-2-D mixed CNN feature extraction and multisource data lightweight attention-free fusion network (M2FNet) based on CNN and transformer. Through end-to-end training, this network effectively combines heterogeneous information from multiple sources, leading to improved performance in joint classification. Specifically, M2FNet uses a multiscale 3-D-2-D mixed CNN design to extract both the spatial-spectral features of HSI and the depth-based elevation features of LiDAR data. Subsequently, the extracted features are fed into a novel encoder comprising a feature enhancement (FE) module, designed with mathematical morphology and a dilated convolutional module derived from the self-attention of the conventional transformer encoder (DConvformer), which plays a crucial role in integrating multisource information within the network. The well-designed architecture enables the network to acquire multiscale depth and high-order features, significantly reducing the number of training parameters. Comparative experimental results and ablation studies demonstrate that M2FNet outperforms other advanced methods.