Coordinating an operational data distribution network for CMIP6 data

Ruth Petrie,Sébastien Denvil,Sasha Ames,Guillaume Levavasseur,Sandro Fiore,Chris Allen,Fabrizio Antonio,KatharinaPier Berger,Pierre-Antoine Bretonnière,Luca Cinquini,Eli Dart,Prashanth Dwarakanath,Kelsey Druken,Ben Evans,Laurent Franchistéguy,Sébastien Gardoll,Eric Gerbier,Mark Greenslade,David Hassell,Alan Iwi,Martin Juckes,Stephan Kindermann,Lukasz Lacinski,Maria Mirto,Atef Ben Nasser,Paola Nassisi,Eric Nienhouse,Sergey Nikonov,Alessandra Nuzzo,Clare Richards,Syazwan Ridzwan,Michel Rixen,Kim Serradell,Kate Snow,Ag Stephens,Martina Stockhause,Hans Vahlenkamp,Rick Wagner
DOI: https://doi.org/10.5194/gmd-2020-153
2020-06-30
Abstract:Abstract. The distribution of data contributed to the Coupled Model Intercomparison Project Phase 6 (CMIP6) is via the Earth System Grid Federation (ESGF). The ESGF is a network of internationally distributed sites that together work as a federated data archive. Data records from climate modelling institutes are published on the ESGF and then shared around the world. It is anticipated that CMIP6 will produce O(20PB) of data to be published and distributed via the ESGF. In addition to this large volume of data a number of value-added CMIP6 services are required to interact with the ESGF, for example the Citation and Errata services both interact with the ESGF but are not a core part of its infrastructure. With a number of interacting services and a large volume of data anticipated for CMIP6 a CMIP Data Node Operations Team (CDNOT) was formed. The CDNOT coordinated and implemented a series of CMIP6 preparation data challenges to test all the interacting components in the ESGF CMIP6 software ecosystem. This ensured that when CMIP6 data were released it could be reliably distributed.
What problem does this paper attempt to address?