LVM-StARS: Large Vision Model Soft Adaption for Remote Sensing Scene Classification

Bohan Yang,Yushi Chen,Pedram Ghamisi
DOI: https://doi.org/10.1109/lgrs.2024.3432069
IF: 5.343
2024-08-18
IEEE Geoscience and Remote Sensing Letters
Abstract:Recently, both large language models and large vision models (LVMs) have gained significant attention. Trained on large-scale datasets, these large models have showcased remarkable capabilities across various research domains. To enhance the accuracy of remote sensing (RS) scene classification, LVM-based methods are explored in this letter. Due to the differences between RS images and natural images, simply transferring LVMs to RS tasks is impractical. Therefore, we conducted research on relevant techniques and appended learnable prompt tokens to the input tokens while freezing the backbone weights, reducing the parameter scale and making the LVM weights easier to harness and to transfer. In consideration of latent catastrophic forgetting issues induced by ordinary finetuning techniques and the inherent complexity and redundancy of RS images, we introduced soft adaption mechanisms between backbone layers based on prompt tuning technique and implemented the first LVM tuning method, namely, the Large Vision Model Soft Adaption for RS scene classification (LVM-StARS)-Deep and the LVM-StARS-Shallow to make LVMs more suitable for RS scene classification tasks. The proposed methods are evaluated on two popular RS scene classification datasets, and the experimental results indicate that the proposed method outperforms other state-of-the-art methods. The experimental results demonstrate that our proposed method enhances overall accuracy (OA) by 1.71%–3.94%, while updating only 0.1%–0.5% of the parameters compared to full finetuning. Furthermore, our method outperforms the existing methods.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?