Planetary computing for data-driven environmental policy-making

Patrick Ferris,Michael Dales,Sadiq Jaffer,Amelia Holcomb,Eleanor Toye Scott,Thomas Swinfield,Alison Eyres,Andrew Balmford,David Coomes,Srinivasan Keshav,Anil Madhavapeddy
2024-06-02
Abstract:We make a case for "planetary computing" -- infrastructure to handle the ingestion, transformation, analysis and publication of global data products for furthering environmental science and enabling better informed policy-making. We draw on our experiences as a team of computer scientists working with environmental scientists on forest carbon and biodiversity preservation, and classify existing solutions by their flexibility in scalably processing geospatial data, and also how well they support building trust in the results via traceability and reproducibility. We identify research gaps in the intersection of computing and environmental science around how to handle continuously changing datasets that are often collected across decades and require careful access control rather than being fully open access.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The paper primarily explores the concept of "Planetary Computing" and its application in environmental science research and policy-making. The author team, composed of computer scientists and environmental scientists, highlights issues in existing data processing systems when supporting large-scale environmental data analysis based on their collaborative experiences over the past few years. They propose planetary computing as a solution. The key issues raised in the paper include: 1. **Data Uncertainty**: Variations in raw datasets lead to differences in analysis results, affecting the reproducibility of research. For example, datasets used for land use classification are updated over time, but older versions are not easily accessible, making it difficult to directly compare research results based on different versions of the data. 2. **Code Uncertainty**: Environmental scientists often lack best practices in software engineering, such as version control and testing, when conducting data analysis. This makes it challenging to track results from specific versions. 3. **Dependency Uncertainty**: Changes in software libraries or hardware can result in different outputs even with the same input. This uncertainty is a challenge for environmental projects that require long-term stability. 4. **Policy Uncertainty**: Policies derived from inaccurate data can have negative impacts. Additionally, environmental science research inherently contains some uncertainty, which is amplified when these studies are translated into policy recommendations. To address these issues, the paper proposes the concept of planetary computing, aiming to build an infrastructure capable of effectively handling global environmental data. This infrastructure will support the ingestion, transformation, analysis, and publication of data to advance environmental science and improve the accuracy and transparency of policy-making. To achieve this goal, the paper emphasizes the need for several key aspects: - **Scalability and Accessibility**: Users should be able to easily add new functionalities to the system, and it should be accessible and usable by non-expert users. - **Confidentiality**: Measures need to be taken to ensure the security of sensitive data, such as wildlife location information that could lead to illegal hunting activities. - **Traceability and Interpretability**: Research results need to be traceable back to their original data and code, and the principles of the algorithms should be clearly expressed to be understood by non-professional decision-makers. - **Reproducibility**: Research results should be independently verifiable, yielding consistent results even in different computing environments. In summary, this paper aims to address the data processing challenges in current environmental science research and proposes a new computational framework—planetary computing—to support a more reliable, transparent, and sustainable environmental policy-making process.