Joint Classification of Hyperspectral Image and LiDAR Data Based on Spectral Prompt Tuning

Yi Kong,Yuhu Cheng,Yang Chen,Xuesong Wang
DOI: https://doi.org/10.1109/tgrs.2024.3417475
IF: 8.2
2024-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:The pretrained vision-language models (VLMs) have achieved outstanding performance in various visual tasks, primarily due to the knowledge they have acquired from massive image-text pairs. This enables VLMs to generalize to a wide range of downstream tasks. This article presents the first attempt to adapt VLMs for the joint classification task of hyperspectral image (HSI) and LiDAR data, aiming to leverage the well-learned VLMs to extract more generalizable features from diverse remote sensing image sources. Initially, using a patch encoder (PE), low-dimensional patches of HSI and LiDAR data are transformed into high-dimensional latent feature representations, meeting the dimensional requirements of VLMs for visual input data. Unlike traditional classifiers that rely on discrete class labels, VLM-based classification methods depend on continuous vectors, which can be derived from textual templates with class names, i.e., prompts. The classification performance of VLM-based methods heavily relies on these prompts, but prompt engineering not only demands extensive expert knowledge but also is extremely time-consuming. To address this, prompt tuning (PT) methods are introduced to enhance the generalizability of VLMs by adding spectral-based prompts to the vision encoder and incorporating randomly initialized, learnable text prompts (TPs) into the text encoder. Finally, through a novel class-discriminative loss function, the distance between text features of different classes is increased, thereby enhancing the model's discriminative ability. Experimental results on the Houston 2013, Trento, and MUUFL datasets demonstrate that the proposed method can achieve competitive classification accuracy with a limited number of labeled pixels.
What problem does this paper attempt to address?