Abstract:Deep learning models are essential for scene classification, change detection, land cover segmentation, and other remote sensing image understanding tasks. Most backbones of existing remote sensing deep learning models are typically initialized by pre-trained weights obtained from ImageNet pre-training (IMP). However, domain gaps exist between remote sensing images and natural images (e.g., ImageNet), making deep learning models initialized by pre-trained weights of IMP perform poorly for remote sensing image understanding. Although some pre-training methods are studied in the remote sensing community, current remote sensing pre-training methods face the problem of vague generalization by only using remote sensing images. In this paper, we propose a novel remote sensing pre-training framework, Generic Knowledge Boosted Remote Sensing Pre-training (GeRSP), to learn robust representations from remote sensing and natural images for remote sensing understanding tasks. GeRSP contains two pre-training branches: (1) A self-supervised pre-training branch is adopted to learn domain-related representations from unlabeled remote sensing images. (2) A supervised pre-training branch is integrated into GeRSP for general knowledge learning from labeled natural images. Moreover, GeRSP combines two pre-training branches using a teacher-student architecture to simultaneously learn representations with general and special knowledge, which generates a powerful pre-trained model for deep learning model initialization. Finally, we evaluate GeRSP and other remote sensing pre-training methods on three downstream tasks, i.e., object detection, semantic segmentation, and scene classification. The extensive experimental results consistently demonstrate that GeRSP can effectively learn robust representations in a unified manner, improving the performance of remote sensing downstream tasks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to improve the pre - training effect of deep - learning models in remote - sensing image - understanding tasks by combining the knowledge of natural images and remote - sensing images**. Specifically, the existing pre - training methods for remote - sensing images mainly rely on weights pre - trained from natural - image data sets such as ImageNet (IMP). However, due to the domain gap between remote - sensing images and natural images (such as differences in perspective, resolution, object appearance, etc.), these pre - trained models perform poorly in remote - sensing image - understanding tasks. In addition, methods that only use remote - sensing images for pre - training also face the problem of insufficient generalization ability. To solve these problems, the paper proposes a new pre - training framework - **Generic Knowledge Boosted Remote Sensing Pre - training (GeRSP)**, which aims to learn more robust representations by combining the knowledge of natural images and remote - sensing images, thereby improving the performance of remote - sensing image - understanding tasks. ### Main contributions of GeRSP: 1. **Proposed a new pre - training framework for remote - sensing images**: GeRSP simultaneously learns general knowledge and domain - specific knowledge by combining supervised pre - training and self - supervised pre - training. 2. **Contains two pre - training branches**: - **Self - supervised pre - training branch**: Learns domain - related features from unlabeled remote - sensing images. - **Supervised pre - training branch**: Learns general knowledge from labeled natural images. 3. **Combines the two pre - training branches through a teacher - student architecture**: Ensures that the model can simultaneously learn general knowledge and domain - specific knowledge, generating a powerful pre - trained model. 4. **Evaluated on three downstream tasks**: Including object detection, semantic segmentation, and scene classification. The experimental results show that GeRSP can effectively improve the performance of remote - sensing image - understanding tasks. ### Core ideas of the paper: - **Enhance generalization ability by using the diversity of natural images**: Natural images provide a wide range of general knowledge, while remote - sensing images contain domain - specific knowledge. By combining the two, the deficiencies of a single data source can be compensated for. - **Solve the limitations of existing pre - training methods**: Existing remote - sensing pre - training methods either rely on natural images, ignoring the domain gap, or only use remote - sensing images, resulting in insufficient generalization ability. GeRSP overcomes these limitations through joint training. In this way, GeRSP not only improves the performance of remote - sensing image - understanding tasks but also provides a new idea for future research, that is, how to better combine the knowledge of different domains to improve the generalization ability and robustness of the model.

Generic Knowledge Boosted Pre-training For Remote Sensing Images

Generic Knowledge Boosted Pretraining for Remote Sensing Images

A lightweight and stochastic depth residual attention network for remote sensing scene classification

Geographical Knowledge-driven Representation Learning for Remote Sensing Images

An Empirical Study of Remote Sensing Pretraining

Task Specific Pretraining with Noisy Labels for Remote Sensing Image Segmentation

Training general representations for remote sensing using in-domain knowledge

SAR-HUB: Pre-Training, Fine-Tuning, and Explaining

Self-supervised Audiovisual Representation Learning for Remote Sensing Data

Supervised and Contrastive Self-Supervised In-Domain Representation Learning for Dense Prediction Problems in Remote Sensing

Do we still need ImageNet pre-training in remote sensing scene classification?

Generative ConvNet Foundation Model With Sparse Modeling and Low-Frequency Reconstruction for Remote Sensing Image Interpretation

RS-SSKD: Self-Supervision Equipped with Knowledge Distillation for Few-Shot Remote Sensing Scene Classification

RS-Dseg: semantic segmentation of high-resolution remote sensing images based on a diffusion model component with unsupervised pretraining

Towards Geospatial Foundation Models via Continual Pretraining

Domain Adaptive Remote Sensing Scene Recognition via Semantic Relationship Knowledge Transfer

SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding

Text guided zero-shot scene classification of high spatial resolution remote sensing images

MutDet: Mutually Optimizing Pre-training for Remote Sensing Object Detection

LSKNet: A Foundation Lightweight Backbone for Remote Sensing

Multi-Label Guided Soft Contrastive Learning for Efficient Earth Observation Pretraining