Rethinking Remote Sensing Pretrained Model: Instance-Aware Visual Prompting for Remote Sensing Scene Classification.

Leyuan Fang,Yang Kuang,Qiang Liu,Yi Yang,Jun Yue
DOI: https://doi.org/10.1109/tgrs.2023.3336283
IF: 8.2
2023-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Large-scale pretrained models, such as vision transformers (ViTs), have made significant progress in remote sensing (RS) scene classification tasks. For a new scene classification task, it is popular to fully fine-tune the pretrained model parameters to avoid training from scratch. Although such an approach achieves satisfactory results, it will lead to heavy computation and storage burden, which limits the transferability of large pretrained models to different RS scene classification tasks. To address this challenge, we propose a parameter-efficient tuning approach called as the instance-aware visual prompting (IVP), which is the first work to explore the prompting in the field of RS scene classification. The proposed IVP adaptively generates prompts based on the complex background and highly variable characteristics of RS images and updates only a few parameters to transfer the pretrained RS transformer model to different scene classification tasks. Specifically, instead of adapting the entire model parameters, we introduce some instance-specific prompt vectors into the input space. Then, considering the significant variability in RS images, we introduce an instance-level prompt generation module to generate specific prompts for each RS image by aggregating contextual information from the input. Finally, these prompt vectors will calibrate the pretrained features to encode instance-specific information. Extensive experiments on three RS scene classification datasets demonstrate the superiority of IVP over other fine-tuning methods. For example, when updating just 1.1% parameters, the Swin transformer (Swin-T) model achieves about 1.83% and 1.42% improvement compared with the full fine-tuning method on NWPU-19 and NWPU-28, respectively.
What problem does this paper attempt to address?