Dual-Branch Feature Fusion Network Based Cross-Modal Enhanced CNN and Transformer for Hyperspectral and LiDAR Classification
Wuli Wang,Chong Li,Peng Ren,Xinchao Lu,Jianbu Wang,Guangbo Ren,Baodi Liu
DOI: https://doi.org/10.1109/lgrs.2024.3367171
IF: 5.343
2024-03-09
IEEE Geoscience and Remote Sensing Letters
Abstract:The joint classification of hyperspectral image (HSI) and light detection and ranging (LiDAR) data has attracted considerable attention in the field of remote sensing. Integrating the advantages of the two data sources can provide precise data support and analytical decision-making for remote-sensing applications. However, due to the inherent differences in properties and semantic information from heterogeneous data, most existing deep-learning methods suboptimally extract the characteristic features of both data sources while utilizing their interactive information. In this letter, we propose a dual-branch feature fusion network-based cross-modal enhanced CNN and Transformer (DF2NCECT) to make full use of the respective features and interactive information of multisource data. DF2NCECT consists of two main stages. One is the basic feature extraction stage, which builds a hybrid convolution module based on 3DCNN and inception structure to fully extract the joint features of HSI from multiple spatial perspectives. The other is the deep feature fusion stage, where the CNN and Transformer are designed in parallel to fully explore and fuse deep features between HSI and LiDAR. More importantly, to achieve efficacious interactive information between HSI and LiDAR, a cross-modal enhanced CNN and Transformer module (CECT) is designed to deeply enhance the fused interactive features from global/local perspectives. Experiments show that the proposed method is superior and outperforms the comparison methods by an average of 3.06% in OA on Houston2013 and 1.79% on Summer, respectively.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics