Knowledge Guided Representation Learning and Causal Structure Learning in Soil Science

Somya Sharma,Swati Sharma,Licheng Liu,Rishabh Tushir,Andy Neal,Robert Ness,John Crawford,Emre Kiciman,Ranveer Chandra
2023-06-16
Abstract:An improved understanding of soil can enable more sustainable land-use practices. Nevertheless, soil is called a complex, living medium due to the complex interaction of different soil processes that limit our understanding of soil. Process-based models and analyzing observed data provide two avenues for improving our understanding of soil processes. Collecting observed data is cost-prohibitive but reflects real-world behavior, while process-based models can be used to generate ample synthetic data which may not be representative of reality. We propose a framework, knowledge-guided representation learning, and causal structure learning (KGRCL), to accelerate scientific discoveries in soil science. The framework improves representation learning for simulated soil processes via conditional distribution matching with observed soil processes. Simultaneously, the framework leverages both observed and simulated data to learn a causal structure among the soil processes. The learned causal graph is more representative of ground truth than other graphs generated from other causal discovery methods. Furthermore, the learned causal graph is leveraged in a supervised learning setup to predict the impact of fertilizer use and changing weather on soil carbon. We present the results in five different locations to show the improvement in the prediction performance in out-of-sample and few-shots setting.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper primarily aims to address two key issues in soil science research: 1. **Improving the understanding of soil processes**: By combining observational data and simulation data, a knowledge-guided representation learning and causal structure learning framework (KGRCL) is proposed to accelerate scientific discovery in the field of soil science. Specifically, this framework aims to improve the representation learning of simulated soil processes through conditional distribution matching and to learn the causal structure among factors in soil processes using both observational and simulation data. 2. **Soil Organic Carbon Prediction**: Utilizing the learned causal graph for supervised learning to enhance the prediction performance of soil organic carbon (SOC) in new environments. Soil organic carbon is crucial for mitigating climate change, making the study of its change mechanisms essential. The paper demonstrates through experiments that this method can effectively predict the impact of fertilizer use and weather changes on soil carbon. In short, the core objective of the research is to develop a machine learning framework that can more accurately reveal the causal relationships between soil processes by integrating data from different sources (including actual observational data and simulation-generated data), and further use this causal knowledge to improve the prediction capability of soil organic carbon. This approach not only helps deepen our understanding of soil systems but also supports sustainable land management practices.