Explainable Data-driven Modeling of Adsorption Energy in Heterogeneous Catalysis

Tirtha Vinchurkar,Janghoon Ock,Amir Barati Farimani
2024-05-31
Abstract:The increasing popularity of machine learning (ML) in catalysis has spurred interest in leveraging these techniques to enhance catalyst design. Our study aims to bridge the gap between physics-based studies and data-driven methodologies by integrating ML techniques with eXplainable AI (XAI). Specifically, we employ two XAI techniques: Post-hoc XAI analysis and Symbolic Regression. These techniques help us unravel the correlation between adsorption energy and the properties of the adsorbate-catalyst system. Leveraging a large dataset such as the Open Catalyst Dataset (OC20), we employ a combination of shallow ML techniques and XAI methodologies. Our investigation involves utilizing multiple shallow machine learning techniques to predict adsorption energy, followed by post-hoc analysis for feature importance, inter-feature correlations, and the influence of various feature values on the prediction of adsorption energy. The post-hoc analysis reveals that adsorbate properties exert a greater influence than catalyst properties in our dataset. The top five features based on higher Shapley values are adsorbate electronegativity, the number of adsorbate atoms, catalyst electronegativity, effective coordination number, and the sum of atomic numbers of the adsorbate molecule. There is a positive correlation between catalyst and adsorbate electronegativity with the prediction of adsorption energy. Additionally, symbolic regression yields results consistent with SHAP analysis. It deduces a mathematical relationship indicating that the square of the catalyst electronegativity is directly proportional to the adsorption energy. These consistent correlations resemble those derived from physics-based equations in previous research. Our work establishes a robust framework that integrates ML techniques with XAI, leveraging large datasets like OC20 to enhance catalyst design through model explainability.
Machine Learning,Chemical Physics
What problem does this paper attempt to address?
The problem this paper attempts to address is to reveal the relationship between adsorption energy and the characteristics of the adsorbate-catalyst system in heterogeneous catalysis by combining machine learning (ML) techniques and explainable artificial intelligence (XAI), particularly post-hoc XAI analysis and symbolic regression. Specifically, the study aims to: 1. **Improve the efficiency of catalyst design**: By predicting adsorption energy, the researchers hope to provide a deeper understanding of catalyst activity and selectivity, thereby optimizing catalyst design to enhance the efficiency and sustainability of chemical reactions. 2. **Bridge the gap between physical research and data-driven methods**: Although machine learning is increasingly applied in the field of catalysis, its "black box" nature limits its contribution to physical insights. This paper introduces XAI techniques, particularly post-hoc XAI analysis and symbolic regression, to enhance the interpretability of models, allowing researchers to better understand the physical mechanisms behind machine learning models. 3. **Identify key factors affecting adsorption energy**: By analyzing the importance of different features and their interactions, the study reveals which factors (such as adsorbate electronegativity, number of adsorbate atoms, catalyst electronegativity, etc.) have a significant impact on the prediction of adsorption energy. 4. **Establish mathematical expressions describing the relationship between input features and adsorption energy**: Through symbolic regression, the researchers derive mathematical formulas that describe the relationship between input features and adsorption energy. These formulas are consistent with physics-based research results, further validating the model's effectiveness. In summary, the main goal of this paper is to enhance the understanding of adsorption energy and its influencing factors by integrating machine learning and XAI techniques, thereby providing strong support for catalyst design.