Abstract:Few-shot open-set recognition, as a new paradigm, leveraging a limited amount of supervised data to identify specific Remote Sensing (RS) scene categories and generalize to novel ones. However, the data bias induced by the small sample size not only causes severe overfitting within base classes, but also impairs the capacity for inference to identify RS scenes in hitherto unobserved categories. Furthermore, owing to environmental influences, RS images frequently manifest notable intra-class disparities and comparatively low inter-class distinctions, intensifying the challenge in obtaining suitable classifiers. To address above issues, we investigate the utilization of a Multi-modal Foundational Model (MFM) infused with essential domain knowledge to mitigate the generalization limitations encountered in few-shot scenarios. Recognizing that existing MFMs with a visual-text dual-branch structure are primarily tailored for natural scenes, we propose a custom Frequency Distribution-based Multi-modal Fine-Tuning strategy (FreqDiMFT) in a parameter-efficient manner. More specifically, within the vision branch, we address the high inter-class similarity and intra-class diversity in RS images by embedding the local-global frequency distribution information to facilitate the recognition of RS scenes. To further amplify the model's generalization ability post transfer, we introduce an adaptive feature refinement module designed for Transformers, proficient in filtering redundant features resulting from domain disparities. To mitigate the domain drift on the textual branch, we adopt an input format that combines basic templates with domain expertise from RS end to generate more discriminative class prototypes. To fully verify the effectiveness of our FreqDiMFT in a more practical setting, we collect a Large-Scale hybrid dataset (LSRS). Extensive experiments demonstrate that, even with a scant number of training samples, our strategy yields advanced performances compared to state-of-the-art models.

Frequency-Aware Multi-Modal Fine-Tuning for Few-Shot Open-Set Remote Sensing Scene Classification

Few-Shot Object Detection with Multi-level Information Interaction for Optical Remote Sensing Images

Few-Shot Remote Sensing Scene Classification Via Subspace Based on Multiscale Feature Learning

Multi-scale fusion for few-shot remote sensing image classification

InfRS: Incremental Few-Shot Object Detection in Remote Sensing Images

RS-MetaNet: Deep Metametric Learning for Few-Shot Remote Sensing Scene Classification

Feature Transformation for Cross-domain Few-shot Remote Sensing Scene Classification

Subspace Prototype Learning for Few-Shot Remote Sensing Scene Classification

Class Centralized Dictionary Learning for Few-Shot Remote Sensing Scene Classification

Few-Shot Remote Sensing Scene Classification with Multi-Metric Fusion

Few-Shot Object Detection on Remote Sensing Images via Shared Attention Module and Balanced Fine-Tuning Strategy

Multi-attention DeepEMD for Few-Shot Learning in Remote Sensing

Learning transferable cross-modality representations for few-shot hyperspectral and LiDAR collaborative classification

Retentive Compensation and Personality Filtering for Few-Shot Remote Sensing Object Detection

Meta-FSEO: A Meta-Learning Fast Adaptation with Self-Supervised Embedding Optimization for Few-Shot Remote Sensing Scene Classification

RS-MetaNet: Deep meta metric learning for few-shot remote sensing scene classification

Metric-based Few-shot Classification in Remote Sensing Image

Not Just Learning from Others but Relying on Yourself: A New Perspective on Few-Shot Segmentation in Remote Sensing

Few-Shot Scene Classification of Optical Remote Sensing Images Leveraging Calibrated Pretext Tasks

Exploring Hard Samples in Multiview for Few-Shot Remote Sensing Scene Classification