Global–Local Transformer Network for HSI and LiDAR Data Joint Classification

Kexing Ding,Ting Lu,Wei Fu,Shutao Li,Fuyan Ma
DOI: https://doi.org/10.1109/tgrs.2022.3216319
IF: 8.2
2022-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Hyperspectral images (HSIs) contain rich spatial and spectral detail information, while light detection and ranging (LiDAR) data can provide the elevation information. Thus, the fusion of HSI and LiDAR data can help in more accurate image classification, which becomes a hot research topic. However, it is difficult to capture complex local and global spatial-spectral associations; meanwhile, how to build an effective interaction between multimodal data is another important issue. To this end, a novel global-local transformer network (GLT-Net) is proposed for the joint classification of HSI and LiDAR data, in this article. The main idea is to fully exploit the advantage of the convolution operator in characterizing locally correlated features and the promising capability of transformer architecture in learning longrange dependencies. Moreover, multiscale feature fusion and probabilistic decision fusion strategies are also designed in one framework, to further improve the classification performance. Here, the proposed GLT-Net mainly consists of multiscale local spatial feature learning, global spectral feature learning, and global-local feature fusion classification. In specific, multimodal image cubes of different sizes are first extracted and sent into convolutional neural networks (CNNs) to learn local spatial features, which is followed by multimodal information propagation and spatial-attention-guided multiscale feature fusion. Afterward, by considering spectral feature channels from a sequential perspective, vision transformers are introduced to model the global spectral dependencies. Finally, multiple class estimations based on local and global features are integrated via a probabilistic decision fusion strategy. In this way, complementary information of multimodal data and local/global spectral-spatial information can be fully mined and jointly used. Extensive experiments on three popular HSI and LiDAR datasets demonstrate that the proposed method performs superiority over the state-of-the-art methods. The source code of the proposed method will be made publicly available at https://github.com/Ding-Kexin/GLT-Net.
What problem does this paper attempt to address?