A KAN-based Interpretable Framework for Process-Informed Prediction of Global Warming Potential

Jaewook Lee,Xinyang Sun,Ethan Errington,Miao Guo
2024-11-01
Abstract:Accurate prediction of Global Warming Potential (GWP) is essential for assessing the environmental impact of chemical processes and materials. Traditional GWP prediction models rely predominantly on molecular structure, overlooking critical process-related information. In this study, we present an integrative GWP prediction model that combines molecular descriptors (MACCS keys and Mordred descriptors) with process information (process title, description, and location) to improve predictive accuracy and interpretability. Using a deep neural network (DNN) model, we achieved an R-squared of 86% on test data with Mordred descriptors, process location, and description information, representing a 25% improvement over the previous benchmark of 61%; XAI analysis further highlighted the significant role of process title embeddings in enhancing model predictions. To enhance interpretability, we employed a Kolmogorov-Arnold Network (KAN) to derive a symbolic formula for GWP prediction, capturing key molecular and process features and providing a transparent, interpretable alternative to black-box models, enabling users to gain insights into the molecular and process factors influencing GWP. Error analysis showed that the model performs reliably in densely populated data ranges, with increased uncertainty for higher GWP values. This analysis allows users to manage prediction uncertainty effectively, supporting data-driven decision-making in chemical and process design. Our results suggest that integrating both molecular and process-level information in GWP prediction models yields substantial gains in accuracy and interpretability, offering a valuable tool for sustainability assessments. Future work may extend this approach to additional environmental impact categories and refine the model to further enhance its predictive reliability.
Machine Learning,Systems and Control
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the two major limitations of the existing global warming potential (GWP) prediction models: 1. **Lack of process and location information**: - Traditional GWP prediction models mainly rely on molecular structure and ignore crucial information related to the production process and geographical location. This process information includes the title, description of the production process, and geographical location, etc. - For example, in life - cycle assessment (LCA), the GWP value is determined by the cumulative production stages, including material selection, reaction path optimization, and overall process design. Ignoring these factors and making predictions solely based on molecular information will limit the accuracy of the model in practical applications. 2. **Low interpretability**: - Most of the existing GWP prediction models use black - box models such as deep neural networks (DNN). Although these models have high prediction performance, their internal mechanisms are difficult to interpret, making it difficult for users to understand which factors have an important impact on GWP prediction. - Although some studies attempt to improve the interpretability of the model through XAI (explainable artificial intelligence) techniques, these methods are usually carried out after model training and fail to fundamentally solve the interpretability problem of black - box models. To solve these problems, this study proposes a comprehensive GWP prediction framework that combines molecular descriptors (such as MACCS keys and Mordred descriptors) with process information (such as process title, description, and location). Specifically: - **Integrating chemical and process information**: By introducing process - related text information (such as process title, description, and location) and embedding it into high - dimensional vectors, and then using it together with molecular descriptors as input features to improve the accuracy and reliability of prediction. - **Improving model interpretability**: Utilize the Kolmogorov - Arnold Network (KAN) model to extract symbolic formulas from data, so that the model not only has high prediction performance but also can provide transparent and interpretable results, helping users better understand the key factors affecting GWP. Through these improvements, this study aims to develop a more accurate and easy - to - interpret GWP prediction model, providing strong support for sustainable chemical process design.