Abstract:Geospatial technologies are becoming increasingly essential in our world for a wide range of applications, including agriculture, urban planning, and disaster response. To help improve the applicability and performance of deep learning models on these geospatial tasks, various works have begun investigating foundation models for this domain. Researchers have explored two prominent approaches for introducing such models in geospatial applications, but both have drawbacks in terms of limited performance benefit or prohibitive training cost. Therefore, in this work, we propose a novel paradigm for building highly effective geospatial foundation models with minimal resource cost and carbon impact. We first construct a compact yet diverse dataset from multiple sources to promote feature diversity, which we term GeoPile. Then, we investigate the potential of continual pretraining from large-scale ImageNet-22k models and propose a multi-objective continual pretraining paradigm, which leverages the strong representations of ImageNet while simultaneously providing the freedom to learn valuable in-domain features. Our approach outperforms previous state-of-the-art geospatial pretraining methods in an extensive evaluation on seven downstream datasets covering various tasks such as change detection, classification, multi-label classification, semantic segmentation, and super-resolution.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to construct an efficient and low - resource - cost foundation model in geospatial tasks. Specifically, the existing geospatial foundation models have the problems of limited performance improvement or overly high training costs. The author proposes a new paradigm. Through the continuous pre - training method, it utilizes the strong representational ability of the large - scale ImageNet - 22k model and combines self - supervised learning of geospatial data to construct an efficient geospatial foundation model, aiming to achieve better performance with lower resource costs and environmental impacts. ### Background and Problems of the Paper With the wide application of geospatial technology in fields such as agriculture, urban planning, and disaster response, it has become particularly important to improve the applicability and performance of deep - learning models in such tasks. Currently, researchers have explored two main methods to introduce foundation models in geospatial applications: 1. **Utilizing existing foundation models in the natural image field**: This method is simple and straightforward. One can directly use publicly available ImageNet pre - trained models for fine - tuning. However, due to the domain gap between natural images and remote - sensing images, this method is not optimal for geospatial data and there is still room for performance improvement. 2. **Pre - trained models specific to the geospatial domain**: These methods usually train networks from scratch on a large amount of remote - sensing image data to learn transferable in - domain representations. However, this method requires a large amount of data and training time. Especially when using large state - of - the - art (SOTA) Transformer models, it is not only time - consuming and labor - intensive but also has a relatively large environmental impact. ### Proposed Method To overcome the limitations of the above methods, the author proposes a new paradigm to construct an efficient geospatial foundation model through the following steps: 1. **Data Selection and Construction**: First, construct a compact and diverse dataset GeoPile. Collect data from multiple sources to promote feature diversity and enhance the effectiveness of pre - training. 2. **Continuous Pre - training**: Use the pre - trained weights of the large - scale ImageNet - 22k model as initialization, and then perform multi - objective continuous pre - training on the GeoPile dataset. Specifically, a teacher - student framework is designed. The teacher model uses ImageNet - 22k pre - trained weights, and the student model starts from random initialization and is trained through two parallel model branches. The teacher model provides intermediate features to guide the learning of the student model. Meanwhile, the student model learns valuable in - domain features through the self - supervised masked image modeling (MIM) task. ### Experimental Results The author conducted extensive evaluations on seven downstream datasets, covering tasks such as change detection, classification, multi - label classification, semantic segmentation, and super - resolution. The experimental results show that the proposed GFM method significantly outperforms the existing geospatial pre - training methods in performance, and also has significant advantages in terms of computational cost and carbon emissions. ### Main Contributions 1. **Data Selection and Construction**: A compact and diverse dataset GeoPile is constructed, which promotes feature diversity and enhances the effectiveness of pre - training. 2. **Multi - objective Continuous Pre - training Paradigm**: A multi - objective continuous pre - training method is proposed. Through the teacher - student framework, it utilizes the strong representational ability of the ImageNet - 22k model and simultaneously learns valuable in - domain features. 3. **Performance and Resource Efficiency**: It achieves better performance than existing methods on multiple downstream tasks while significantly reducing computational costs and carbon emissions. Through these contributions, this paper provides a new idea for constructing efficient and low - resource - cost geospatial foundation models.

Towards Geospatial Foundation Models via Continual Pretraining

On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence

On the Opportunities and Challenges of Foundation Models for GeoAI (Vision Paper)

Pretraining Billion-scale Geospatial Foundational Models on Frontier

Generative ConvNet Foundation Model With Sparse Modeling and Low-Frequency Reconstruction for Remote Sensing Image Interpretation

Foundation Models for Generalist Geospatial Artificial Intelligence

Towards a Knowledge guided Multimodal Foundation Model for Spatio-Temporal Remote Sensing Applications

City Foundation Models for Learning General Purpose Representations from OpenStreetMap

GEO-Bench: Toward Foundation Models for Earth Monitoring

Cross-Domain Foundation Model Adaptation: Pioneering Computer Vision Models for Geophysical Data Analysis

Bridging Remote Sensors with Multisensor Geospatial Foundation Models

Multi-Label Guided Soft Contrastive Learning for Efficient Earth Observation Pretraining

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining

Generic Knowledge Boosted Pre-training For Remote Sensing Images

Generic Knowledge Boosted Pretraining for Remote Sensing Images

Multilabel-Guided Soft Contrastive Learning for Efficient Earth Observation Pretraining

A Billion-scale Foundation Model for Remote Sensing Images

SpectralGPT: Spectral Remote Sensing Foundation Model

Specialized Foundation Models Struggle to Beat Supervised Baselines

On the Generalizability of Foundation Models for Crop Type Mapping