Abstract:While the pretraining of Foundation Models (FMs) for remote sensing (RS) imagery is on the rise, models remain restricted to a few hundred million parameters. Scaling models to billions of parameters has been shown to yield unprecedented benefits including emergent abilities, but requires data scaling and computing resources typically not available outside industry R&D labs. In this work, we pair high-performance computing resources including Frontier supercomputer, America's first exascale system, and high-resolution optical RS data to pretrain billion-scale FMs. Our study assesses performance of different pretrained variants of vision Transformers across image classification, semantic segmentation and object detection benchmarks, which highlight the importance of data scaling for effective model scaling. Moreover, we discuss construction of a novel TIU pretraining dataset, model initialization, with data and pretrained models intended for public release. By discussing technical challenges and details often lacking in the related literature, this work is intended to offer best practices to the geospatial community toward efficient training and benchmarking of larger FMs.

What problem does this paper attempt to address?

The problems that this paper attempts to solve mainly focus on the following aspects: 1. **Application of large - scale models on remote sensing images**: Currently, the number of parameters in basic remote - sensing - image models (FMs) is usually limited to within a few hundred million parameters, while large - scale models (such as models with billions of parameters) have already shown significant advantages in the fields of natural - language processing and computer vision, including emergent abilities. Therefore, the paper aims to explore how to apply these advantages to the analysis of high - resolution satellite images. 2. **Challenges of data and computing resources**: Training large - scale models requires a large amount of data and high - performance computing resources, which are usually difficult to obtain in academia. By using Frontier, the first exascale supercomputer in the United States, the paper explores how to overcome these challenges and provides effective practical methods. 3. **Model architecture and pre - training strategies**: The paper introduces several Vision Transformer (ViT) variants of different sizes and evaluates their performance in tasks such as image classification, semantic segmentation, and object detection. In addition, the paper also discusses technical details in aspects such as model initialization, dataset construction, and pre - training strategies. 4. **Model generalization ability and label efficiency**: An important advantage of large - scale models is their generalization ability across different tasks and the reduced need for labeled data. The paper verifies these advantages through experiments and discusses how to further improve the model's generalization ability and label efficiency. 5. **Challenges of standardized benchmark testing**: The paper points out that many existing studies lack detailed information on reproducibility, especially in the evaluation of downstream tasks. Therefore, the paper emphasizes the importance of establishing standardized benchmark testing and puts forward some specific suggestions. In summary, the main purpose of this paper is to explore the application potential of large - scale models in high - resolution satellite - image analysis through their training and evaluation, and to provide relevant technical details and best practices to promote the further development of this field.

OReole-FM: successes and challenges toward billion-parameter foundation models for high-resolution satellite imagery

Pretraining Billion-scale Geospatial Foundational Models on Frontier

A Billion-scale Foundation Model for Remote Sensing Images

Dynamic Convolution Covariance Network Using Multi-Scale Feature Fusion for Remote Sensing Scene Image Classification

One for All: Toward Unified Foundation Models for Earth Vision

Foundation Models for Remote Sensing and Earth Observation: A Survey

Foundation Models for Generalist Geospatial Artificial Intelligence

SatVision-TOA: A Geospatial Foundation Model for Coarse-Resolution All-Sky Remote Sensing Imagery

Generative ConvNet Foundation Model With Sparse Modeling and Low-Frequency Reconstruction for Remote Sensing Image Interpretation

Enabling Foundation Models: A Distributed Collaboration Framework Based on Graph Federated Learning

Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model

Brain-Inspired Remote Sensing Foundation Models and Open Problems: A Comprehensive Survey

On the Generalizability of Foundation Models for Crop Type Mapping

When are Foundation Models Effective? Understanding the Suitability for Pixel-Level Classification Using Multispectral Imagery

Specialized Foundation Models Struggle to Beat Supervised Baselines

Uncertainty and Generalizability in Foundation Models for Earth Observation

SpectralEarth: Training Hyperspectral Foundation Models at Scale

SpectralGPT: Spectral Remote Sensing Foundation Model

Toward Foundation Models for Earth Monitoring: Proposal for a Climate Change Benchmark

MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining