The ESGF Virtual Aggregation (CMIP6 v20240125)

Ezequiel Cimadevilla,Bryan Lawrence,Antonio Santiago Cofiño
DOI: https://doi.org/10.5194/gmd-2024-120
IF: 5.1
2024-09-10
Geoscientific Model Development
Abstract:The Earth System Grid Federation (ESGF) holds several petabytes of climate data distributed across millions of files held in data centers worldwide. Obtaining and manipulating the scientific information (climate variables) held in these files is non-trivial. The ESGF Virtual Aggregation is one of several solutions to providing an out-of-the-box aggregated and analysis ready view of those variables. Here we discuss the ESGF Virtual Aggregation in the context of the existing infrastructure, and some of those other solutions providing analysis ready data. We describe how it is constructed, how it can be used, and provide some performance evaluation. It will be seen that the ESGF Virtual Aggregation provides a sustainable solution to some of the problems encountered in producing analysis ready data, without the cost of data replication to different formats, albeit at the cost of more data movement within the analysis than some alternatives. If heavily used, it may also require more ESGF data servers than are currently deployed in data node deployments. The need for such data servers should be a component of ongoing discussions about the future of the ESGF and its constituent core services.
geosciences, multidisciplinary
What problem does this paper attempt to address?