Cross Hyperspectral and LiDAR Attention Transformer: An Extended Self-Attention for Land Use and Land Cover Classification

Swalpa Kumar Roy,Atri Sukul,Ali Jamali,Juan M. Haut,Pedram Ghamisi
DOI: https://doi.org/10.1109/tgrs.2024.3374324
IF: 8.2
2024-04-05
IEEE Transactions on Geoscience and Remote Sensing
Abstract:The successes of attention-driven deep models like the vision transformer (ViT) have sparked interest in cross-domain exploration. However, current transformer-based techniques in remote sensing (RS) primarily focus on single-modal data, limiting their potential to exploit the growing array of multimodal Earth observation (EO) data fully. Enhancing these models for multimodal integration is crucial for comprehensive RS applications. To achieve this, we extend the traditional self-attention mechanism by introducing cross hyperspectral and light detection and ranging (LiDAR) (Cross-HL) attention. We present a novel multimodal deep learning framework that effectively fuses RS data, intending to improve land use and land cover (LULC) recognition. To enhance the accurate exchange of information across different modalities, we fuse their patch projections using the Cross-HL self-attention module. In this process, LiDAR patch tokens serve as queries (Q), while keys (K) and values (V) are derived from HS patch tokens. To demonstrate the superiority of Cross-HL in the proposed multimodal deep learning framework, we conducted extensive experiments on three multimodal RS benchmark datasets: Houston, Trento, and MUUFL. These datasets contain hyperspectral (HS) and LiDAR data. The source code for Cross-HL will be made available publicly at https://github.com/AtriSukul1508/Cross-HL.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?