Supplementary for APPLeNet: Visual Attention Parameterized Prompt Learning for Few-Shot Remote Sensing Image Generalization using CLIP

Ankit Jha,Biplab Banerjee,M. Singha,Bhupendra S. Solanki,Shirsha Bose
Abstract:We experiment with the proposed APPLeNet on four different remote sensing benchmark datasets; PatternNet [3], RSICD [4], RESISC45 [1], and MLRSNet [5]. The detailed descriptions are as follows: PatternNet [3] includes 38 classes and each class has 800 images of size 256 × 256 pixels. The images are largescale high-resolution images collected from Google Earth imagery based on US cities for remote sensing image retrieval. Remote Sensing Image Captioning Dataset (RSICD) [4] includes 30 classes and total number of 10, 000 images of size 224 × 224 pixels. Each class has a different number of images. This dataset also has five sentence descriptions per image, usually used for auto-image captioning applications. Nevertheless, here we have used only the images, as the captions are learnable in our approach. Remote Sensing Image Scene Classification (RESISC45) [1] dataset includes 45 classes and each class has 700 images of size 256 × 256 pixels. The spatial resolution of its
Computer Science,Environmental Science
What problem does this paper attempt to address?