A KAN-based Interpretable Framework for Process-Informed Prediction of Global Warming Potential

Jaewook Lee,Xinyang Sun,Ethan Errington,Miao Guo

2024-11-01

Abstract:Accurate prediction of Global Warming Potential (GWP) is essential for assessing the environmental impact of chemical processes and materials. Traditional GWP prediction models rely predominantly on molecular structure, overlooking critical process-related information. In this study, we present an integrative GWP prediction model that combines molecular descriptors (MACCS keys and Mordred descriptors) with process information (process title, description, and location) to improve predictive accuracy and interpretability. Using a deep neural network (DNN) model, we achieved an R-squared of 86% on test data with Mordred descriptors, process location, and description information, representing a 25% improvement over the previous benchmark of 61%; XAI analysis further highlighted the significant role of process title embeddings in enhancing model predictions. To enhance interpretability, we employed a Kolmogorov-Arnold Network (KAN) to derive a symbolic formula for GWP prediction, capturing key molecular and process features and providing a transparent, interpretable alternative to black-box models, enabling users to gain insights into the molecular and process factors influencing GWP. Error analysis showed that the model performs reliably in densely populated data ranges, with increased uncertainty for higher GWP values. This analysis allows users to manage prediction uncertainty effectively, supporting data-driven decision-making in chemical and process design. Our results suggest that integrating both molecular and process-level information in GWP prediction models yields substantial gains in accuracy and interpretability, offering a valuable tool for sustainability assessments. Future work may extend this approach to additional environmental impact categories and refine the model to further enhance its predictive reliability.

Machine Learning,Systems and Control

What problem does this paper attempt to address?

The main problems that this paper attempts to solve are the two major limitations of the existing global warming potential (GWP) prediction models: 1. **Lack of process and location information**: - Traditional GWP prediction models mainly rely on molecular structure and ignore crucial information related to the production process and geographical location. This process information includes the title, description of the production process, and geographical location, etc. - For example, in life - cycle assessment (LCA), the GWP value is determined by the cumulative production stages, including material selection, reaction path optimization, and overall process design. Ignoring these factors and making predictions solely based on molecular information will limit the accuracy of the model in practical applications. 2. **Low interpretability**: - Most of the existing GWP prediction models use black - box models such as deep neural networks (DNN). Although these models have high prediction performance, their internal mechanisms are difficult to interpret, making it difficult for users to understand which factors have an important impact on GWP prediction. - Although some studies attempt to improve the interpretability of the model through XAI (explainable artificial intelligence) techniques, these methods are usually carried out after model training and fail to fundamentally solve the interpretability problem of black - box models. To solve these problems, this study proposes a comprehensive GWP prediction framework that combines molecular descriptors (such as MACCS keys and Mordred descriptors) with process information (such as process title, description, and location). Specifically: - **Integrating chemical and process information**: By introducing process - related text information (such as process title, description, and location) and embedding it into high - dimensional vectors, and then using it together with molecular descriptors as input features to improve the accuracy and reliability of prediction. - **Improving model interpretability**: Utilize the Kolmogorov - Arnold Network (KAN) model to extract symbolic formulas from data, so that the model not only has high prediction performance but also can provide transparent and interpretable results, helping users better understand the key factors affecting GWP. Through these improvements, this study aims to develop a more accurate and easy - to - interpret GWP prediction model, providing strong support for sustainable chemical process design.

A KAN-based Interpretable Framework for Process-Informed Prediction of Global Warming Potential

Ultra-early Prediction of the Process Parameters of Coal Chemical Production

Deep Learning for GWP Prediction: A Framework Using PCA, Quantile Transformation, and Ensemble Modeling

MSPA BASED ON PROCESS INFORMATION DENOISED WITH WAVELET TRANSFORM AND ITS APPLICATION TO CHEMICAL PROCESS MONITORING

Efficient prediction of potential energy surface and physical properties with Kolmogorov-Arnold Networks

Interpretable GHG Emission Prediction for Papermaking Wastewater Treatment Process with Deep Learning

Water quality soft-sensor prediction in anaerobic process using deep neural network optimized by Tree-structured Parzen Estimator

KA-GNN: Kolmogorov-Arnold Graph Neural Networks for Molecular Property Prediction

Materials Properties Prediction (MAPP): Empowering the prediction of material properties solely based on chemical formulas

Interpretable machine learning to model biomass and waste gasification

Directed message passing neural network (D-MPNN) with graph edge attention (GEA) for property prediction of biofuel-relevant species

Deep Learning for Green Chemistry: An AI-Enabled Pathway for Biodegradability Prediction and Organic Material Discovery

GNN-SKAN: Harnessing the Power of SwallowKAN to Advance Molecular Representation Learning with GNNs

High Accuracy Prediction of the Post-Combustion Carbon Capture Process Parameters Using the Decision Forest Approach

Bridging the Semantic-Numerical Gap: A Numerical Reasoning Method of Cross-modal Knowledge Graph for Material Property Prediction

Genetic programming expressions for effluent quality prediction: Towards AI-driven monitoring and management of wastewater treatment plants

A flexible and efficient knowledge-guided machine learning data assimilation (KGML-DA) framework for agroecosystem prediction in the US Midwest

Process structure-based fully connected neural network for the modelling of chemical processes: A comparison between global and modular configurations

Adaptive Data‐Driven Modeling Strategy Based on Feature Selection for an Industrial Natural Gas Sweetening Process

A Two-Stage Multi-Target Domain Adaptation Framework for Prediction of Key Performance Indicators Based on Adversarial Network