Abstract:Foundation models have the potential to transform the landscape of remote sensing (RS) data analysis by enabling large computer vision models to be pre-trained on vast amounts of remote sensing data. These models can then be fine-tuned with small amounts of labeled training and applied to a variety of applications. Most existing foundation models are designed for high spatial resolution, cloud-free satellite imagery or photos, limiting their applicability in scenarios that require frequent temporal monitoring or broad spectral profiles. As a result, foundation models trained solely on cloud-free images have limited utility for applications that involve atmospheric variables or require atmospheric corrections. We introduce SatVision-TOA, a novel foundation model pre-trained on 14-band MODIS L1B Top-Of-Atmosphere (TOA) radiance imagery, addressing the need for models pre-trained to handle moderate- and coarse-resolution all-sky remote sensing data. The SatVision-TOA model is pre-trained using a Masked-Image-Modeling (MIM) framework and the SwinV2 architecture, and learns detailed contextual representations through self-supervised learning without the need for labels. It is a 3 billion parameter model that is trained on 100 million images. To our knowledge this is the largest foundation model trained solely on satellite RS imagery. Results show that SatVision-TOA achieves superior performance over baseline methods on downstream tasks such as 3D cloud retrieval. Notably, the model achieves a mean intersection over union (mIOU) of 0.46, a substantial improvement over the baseline mIOU of 0.22. Additionally, the rate of false negative results in the fine-tuning task were reduced by over 50% compared to the baseline. Our work advances pre-trained vision modeling for multispectral RS by learning from a variety of atmospheric and aerosol conditions to improve cloud and land surface monitoring.

When are Foundation Models Effective? Understanding the Suitability for Pixel-Level Classification Using Multispectral Imagery

Generative ConvNet Foundation Model With Sparse Modeling and Low-Frequency Reconstruction for Remote Sensing Image Interpretation

Foundation Model-Based Multimodal Remote Sensing Data Classification

AI Foundation Models in Remote Sensing: A Survey

Specialized Foundation Models Struggle to Beat Supervised Baselines

Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI

On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence

Foundation Models for Generalist Geospatial Artificial Intelligence

On the Opportunities and Challenges of Foundation Models for GeoAI (Vision Paper)

Exploring Foundation Models in Remote Sensing Image Change Detection: A Comprehensive Survey

Foundation Model-Based Spectral–Spatial Transformer for Hyperspectral Image Classification

Towards a Knowledge guided Multimodal Foundation Model for Spatio-Temporal Remote Sensing Applications

Vision foundation models: can they be applied to astrophysics data?

SpectralGPT: Spectral Remote Sensing Foundation Model

Foundation Models for Remote Sensing and Earth Observation: A Survey

A Billion-scale Foundation Model for Remote Sensing Images

A Critical Look at the Current Usage of Foundation Model for Dense Recognition Task

SatVision-TOA: A Geospatial Foundation Model for Coarse-Resolution All-Sky Remote Sensing Imagery

Low-Resource Vision Challenges for Foundation Models

When is a Foundation Model a Foundation Model

FRoundation: Are Foundation Models Ready for Face Recognition?