Abstract:As AI workloads increase in scope, generalization capability becomes challenging for small task-specific models and their demand for large amounts of labeled training samples increases. On the contrary, Foundation Models (FMs) are trained with internet-scale unlabeled data via self-supervised learning and have been shown to adapt to various tasks with minimal fine-tuning. Although large FMs have demonstrated significant impact in natural language processing and computer vision, efforts toward FMs for geospatial applications have been restricted to smaller size models, as pretraining larger models requires very large computing resources equipped with state-of-the-art hardware accelerators. Current satellite constellations collect 100+TBs of data a day, resulting in images that are billions of pixels and multimodal in nature. Such geospatial data poses unique challenges opening up new opportunities to develop FMs. We investigate billion scale FMs and HPC training profiles for geospatial applications by pretraining on publicly available data. We studied from end-to-end the performance and impact in the solution by scaling the model size. Our larger 3B parameter size model achieves up to 30% improvement in top1 scene classification accuracy when comparing a 100M parameter model. Moreover, we detail performance experiments on the Frontier supercomputer, America's first exascale system, where we study different model and data parallel approaches using PyTorch's Fully Sharded Data Parallel library. Specifically, we study variants of the Vision Transformer architecture (ViT), conducting performance analysis for ViT models with size up to 15B parameters. By discussing throughput and performance bottlenecks under different parallelism configurations, we offer insights on how to leverage such leadership-class HPC resources when developing large models for geospatial imagery applications.

Large-Scale Deep Learning on the YFCC100M Dataset

Very Deep Convolutional Networks for Large-Scale Image Recognition

Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

Deep Learning Using Isotroping, Laplacing, Eigenvalues Interpolative Binding, and Convolved Determinants with Normed Mapping for Large-Scale Image Retrieval

Deep Image: Scaling up Image Recognition

Tag Prediction at Flickr: a View from the Darkroom

Residual squeeze CNDS deep learning CNN model for very large scale places image recognition

Pretraining Billion-scale Geospatial Foundational Models on Frontier

Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data

11 TeraFLOPs per second photonic convolutional accelerator for deep learning optical neural networks

Rethinking the Inception Architecture for Computer Vision

Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling

Fusing Deep Convolutional Networks for Large Scale Visual Concept Classification

Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models

Efficient Large Scale Video Classification

Efficient Scheduling in Training Deep Convolutional Networks at Large Scale

Deep Learning Features at Scale for Visual Place Recognition.

FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters

Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations

Deep Collaborative Learning for Visual Recognition.