Abstract:Finding catalytic materials with optimal properties for sustainable chemical and energy transformations is one of the pressing challenges facing our society today. Traditionally, the discovery of catalysts or the philosopher’s stone of alchemists relies on a trial-and-error approach with physicochemical intuition. Decades-long advances in science and engineering, particularly in quantum chemistry and computing infrastructures, popularize a paradigm of computational science for materials discovery. However, the brute-force search through a vast chemical space is hampered by its formidable cost. In recent years, machine learning (ML) has emerged as a promising approach to streamline the design of active sites by learning from data. As ML is increasingly employed to make predictions in practical settings, the demand for domain interpretability is surging. Therefore, it is of great importance to provide an in-depth review of our efforts in tackling this challenging issue in computational heterogeneous catalysis. In this Account, we present an interpretable ML framework for accelerating catalytic materials design, particularly in driving sustainable carbon, nitrogen, and oxygen cycles. By leveraging the linear adsorption-energy scaling and Brønsted–Evans–Polanyi (BEP) relationships, catalytic outcomes (i.e., activity, selectivity, and stability) of a multistep reaction can often be mapped onto one or two kinetics-informed descriptors. One type of descriptor of great importance is the adsorption energies of representative species at active site motifs that can be computed from quantum-chemical simulations. To complement such a descriptor-based design strategy, we delineate our endeavors in incorporating domain knowledge into a data-driven ML workflow. We demonstrate that the major drawbacks of black-box ML algorithms, e.g., poor explainability, can be largely circumvented by employing (1) physics-inspired feature engineering, (2) Bayesian statistical learning, and (3) theory-infused deep neural networks. The framework drastically facilitates the design of heterogeneous metal-based catalysts, some of which have been experimentally verified for an array of sustainable chemistries. We offer some remarks on the existing challenges, opportunities, and future directions of interpretable ML in predicting catalytic materials and, more importantly, on advancing catalysis theory beyond conventional wisdom. We envision that this Account will attract more researchers’ attention to develop highly accurate, easily explainable, and trustworthy materials design strategies, facilitating the transition to the data science paradigm for sustainability through catalysis.

Automation and Machine Learning Augmented by Large Language Models in Catalysis Study

CataLM: Empowering Catalyst Design Through Large Language Models

An Artificial Intelligence (AI) workflow for catalyst design and optimization

Automated transition metal catalysts discovery and optimisation with AI and Machine Learning

Machine Learning for Catalysis Informatics: Recent Applications and Prospects

Catalyze Materials Science with Machine Learning

A Machine Learning and Explainable AI Framework Tailored for Unbalanced Experimental Catalyst Discovery

Machine Learning Descriptors for Data‐Driven Catalysis Study

High-throughput experimentation meets artificial intelligence: a new pathway to catalyst discovery.

Interpretable Machine Learning for Catalytic Materials Design toward Sustainability

Interpretable Catalysis Models Using Machine Learning with Spectroscopic Descriptors

Toward Next-Generation Heterogeneous Catalysts: Empowering Surface Reactivity Prediction with Machine Learning

Open Challenges in Developing Generalizable Large Scale Machine Learning Models for Catalyst Discovery

Machine-learning-accelerated Discovery of Single-Atom Catalysts Based on Bidirectional Activation Mechanism

Integrating Machine Learning and Large Language Models to Advance Exploration of Electrochemical Reactions

Open Challenges in Developing Generalizable Large-Scale Machine-Learning Models for Catalyst Discovery

Toward accelerated discovery of solid catalysts using extrapolative machine learning approach

Integrating Machine Learning and Large Language Models to Advance Wu Exploration of Electrochemical Reactions

Large Language Models are Catalyzing Chemistry Education

Machine-learning atomic simulation for heterogeneous catalysis

How Machine Learning Can Accelerate Electrocatalysis Discovery and Optimization