Interpreting Chemisorption Strength with AutoML-based Feature Deletion Experiments

Zhuo Li,Changquan Zhao,Haikun Wang,Yanqing Ding,Yechao Chen,Philippe Schwaller,Ke Yang,Cheng Hua,Yulian He
DOI: https://doi.org/10.26434/chemrxiv-2023-t7r63
2023-11-09
Abstract:The chemisorption energy of reactants on a catalyst surface, E_ads, is among the most informative characters of understanding and pinpointing the optimal catalyst. The intrinsic complexity of catalyst surfaces and chemisorption reactions presents significant difficulties in identifying the pivotal physical quantities determining Eads. In response to this, the study proposes a novel methodology, the feature deletion experiment, based on Automatic Machine Learning (AutoML) for knowledge extraction from a high-throughput density functional theory (DFT) database. The study reveals that, for binary alloy surfaces, the local adsorption site geometric information is the primary physical quantity determining E_ads, compared to the electronic and physiochemical properties of the catalyst alloys. By integrating the feature deletion experiment with instance-wise variable selection (INVASE), a neural network-based explainable AI (XAI) tool, we established the best-performing feature set containing 21 intrinsic, non-DFT computed properties, achieving an MAE of 0.23 eV across a periodic table-wide chemical space involving more than 1,600 types of alloys surfaces and 8,400 chemisorption reactions. This study demonstrates the stability, consistency, and potential of AutoML-based feature deletion experiment in developing concise, predictive, and theoretically meaningful models for complex chemical problems with minimal human intervention.
Chemistry
What problem does this paper attempt to address?
The problem this paper attempts to address is how to elucidate the key physical quantities of catalyst surface chemical adsorption strength (Eads) through feature elimination experiments in automated machine learning (AutoML) technology. Specifically, the main challenge faced by researchers is to determine which physical quantities are crucial for predicting chemical adsorption strength, especially in complex catalyst surfaces and chemical adsorption reactions. Traditional methods such as density functional theory (DFT) can provide accurate data but are computationally expensive and difficult to interpret. Therefore, this study proposes a new methodology aimed at extracting knowledge from high-throughput DFT databases using AutoML technology to identify the key physical quantities that determine chemical adsorption strength. The main contributions of the paper include: 1. **Proposing a new method based on AutoML feature elimination experiments**: This method effectively identifies the set of features crucial for predicting chemical adsorption strength. For binary alloy surfaces, the geometric information of local adsorption sites is proven to be the most critical physical quantity. 2. **Combining Instance Variable Selection (INVASE)**: This is a neural network-based explainable artificial intelligence (XAI) tool. Through this combination, researchers successfully established an optimal feature set containing 21 intrinsic, non-DFT calculated attributes, achieving a mean absolute error (MAE) of 0.23 eV. 3. **Demonstrating the stability and consistency of AutoML in complex chemical problems**: With minimal human intervention, AutoML can develop concise, highly predictive, and theoretically meaningful models across a broad chemical space. In summary, this paper aims to enhance the understanding of the chemical adsorption process on catalyst surfaces through innovative methodologies, providing theoretical support for catalyst design.