How Interpretable Machine Learning Can Benefit Process Understanding in the Geosciences
Shijie Jiang,Lily-belle Sweet,Georgios Blougouras,Alexander Brenning,Wantong Li,Markus Reichstein,Joachim Denzler,Shangguan Wei,Guo Yu,Feini Huang,Jakob Zscheischler
DOI: https://doi.org/10.1029/2024ef004540
2024-01-01
Abstract:Interpretable Machine Learning (IML) has rapidly advanced in recent years, offering new opportunities to improve our understanding of the complex Earth system. IML goes beyond conventional machine learning by not only making predictions but also seeking to elucidate the reasoning behind those predictions. The combination of predictive power and enhanced transparency makes IML a promising approach for uncovering relationships in data that may be overlooked by traditional analysis. Despite its potential, the broader implications for the field have yet to be fully appreciated. Meanwhile, the rapid proliferation of IML, still in its early stages, has been accompanied by instances of careless application. In response to these challenges, this paper focuses on how IML can effectively and appropriately aid geoscientists in advancing process understanding-areas that are often underexplored in more technical discussions of IML. Specifically, we identify pragmatic application scenarios for IML in typical geoscientific studies, such as quantifying relationships in specific contexts, generating hypotheses about potential mechanisms, and evaluating process-based models. Moreover, we present a general and practical workflow for using IML to address specific research questions. In particular, we identify several critical and common pitfalls in the use of IML that can lead to misleading conclusions, and propose corresponding good practices. Our goal is to facilitate a broader, yet more careful and thoughtful integration of IML into Earth science research, positioning it as a valuable data science tool capable of enhancing our current understanding of the Earth system. Artificial Intelligence is a rapidly advancing field, in which Interpretable Machine Learning (IML) is seen as having the potential to significantly improve our understanding of Earth's complex environmental systems. IML goes beyond the predictive power of machine learning models, focusing instead on uncovering the relationships within the data that are revealed by the model's learning process. However, there is still a lack of straightforward, practical domain-specific guidelines for geoscientists that facilitate both broader and more careful application in the field. In this paper, we aim to demonstrate the real-world benefits of IML in typical geoscientific analysis. We provide a clear, step-by-step workflow that shows how IML can be used to address specific questions. We also point out some common pitfalls in using IML and offer solutions to avoid them. Our goal is to make IML more accessible and useful to a wider range of geoscientists, and we believe that IML, if used properly and thoughtfully, can become an essential and valuable tool to advance our understanding of complex Earth systems. We demonstrate the broader relevance of Interpretable Machine Learning (IML) to most geoscientists and underexplored opportunities for its use We describe a workflow for the effective use of IML while cautioning against potential and common pitfalls We suggest good practices for its adoption and advocate for more careful application to ensure reliable and robust insights for the field