On-Demand Earth System Data Cubes

David Montero,César Aybar,Chaonan Ji,Guido Kraemer,Maximilian Söchting,Khalil Teber,Miguel D. Mahecha
2024-04-19
Abstract:Advancements in Earth system science have seen a surge in diverse datasets. Earth System Data Cubes (ESDCs) have been introduced to efficiently handle this influx of high-dimensional data. ESDCs offer a structured, intuitive framework for data analysis, organising information within spatio-temporal grids. The structured nature of ESDCs unlocks significant opportunities for Artificial Intelligence (AI) applications. By providing well-organised data, ESDCs are ideally suited for a wide range of sophisticated AI-driven tasks. An automated framework for creating AI-focused ESDCs with minimal user input could significantly accelerate the generation of task-specific training data. Here we introduce cubo, an open-source Python tool designed for easy generation of AI-focused ESDCs. Utilising collections in SpatioTemporal Asset Catalogs (STAC) that are stored as Cloud Optimised GeoTIFFs (COGs), cubo efficiently creates ESDCs, requiring only central coordinates, spatial resolution, edge size, and time range.
Databases,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The paper primarily introduces an open-source Python tool named cubo, designed to simplify the generation process of AI-specific Earth System Data Cubes (ESDCs). By leveraging data stored in Cloud Optimized GeoTIFFs (COGs) and SpatioTemporal Asset Catalogs (STAC), cubo can automatically create structured ESDCs with minimal user input, thereby providing high-quality training data for AI-driven tasks. Specifically, the paper addresses the following issues: 1. **The need for AI-specific ESDCs**: Although there are some tools available for generating ESDCs, there is a lack of a systematic approach to generate ESDCs with equal-length spatial grids specifically tailored for AI applications on demand. 2. **Simplifying the generation process**: cubo introduces a set of simplified parameters to characterize AI-specific ESDCs. Users only need to provide basic information such as central coordinates, edge length, spatial resolution, and time range to automatically generate these data cubes. 3. **Compatibility and flexibility**: cubo is compatible with any COG collections stored in STAC and can retrieve information from different data sources to meet the research needs of various regions. 4. **Application scenario examples**: The paper demonstrates the application potential of cubo through two examples, including creating diverse ESDCs at multiple global locations using different parameters, and generating standardized ESDCs from multiple datasets at the same location using the same parameters. In summary, the introduction of cubo greatly simplifies the generation process of AI-specific ESDCs, helping to accelerate AI-related data analysis tasks in Earth system science research.