Abstract:Heterogeneous catalysts are rather complex materials that come in many classes (e.g., metals, oxides, carbides) and shapes. At the same time, the interaction of the catalyst surface with even a relatively simple gas-phase environment such as syngas (CO and H2) may already produce a wide variety of reaction intermediates ranging from atoms to complex molecules. The starting point for creating predictive maps of, e.g., surface coverages or chemical activities of potential catalyst materials is the reliable prediction of adsorption enthalpies of all of these intermediates. For simple systems, direct density functional theory (DFT) calculations are currently the method of choice. However, a wider exploration of complex materials and reaction networks generally requires enthalpy predictions at lower computational cost.The use of machine learning (ML) and related techniques to make accurate and low-cost predictions of quantum-mechanical calculations has gained increasing attention lately. The employed approaches span from physically motivated models over hybrid physics-ΔML approaches to complete black-box methods such as deep neural networks. In recent works we have explored the possibilities for using a compressed sensing method (Sure Independence Screening and Sparsifying Operator, SISSO) to identify sparse (low-dimensional) descriptors for the prediction of adsorption enthalpies at various active-site motifs of metals and oxides. We start from a set of physically motivated primary features such as atomic acid/base properties, coordination numbers, or band moments and let the data and the compressed sensing method find the best algebraic combination of these features. Here we take this work as a starting point to categorize and compare recent ML-based approaches with a particular focus on model sparsity, data efficiency, and the level of physical insight that one can obtain from the model.Looking ahead, while many works to date have focused only on the mere prediction of databases of, e.g., adsorption enthalpies, there is also an emerging interest in our field to start using ML predictions to answer fundamental science questions about the functioning of heterogeneous catalysts or perhaps even to design better catalysts than we know today. This task is significantly simplified in works that make use of scaling-relation-based models (volcano curves), where the model outcome is determined by only one or two adsorption enthalpies and which consequently become the sole target for ML-based high-throughput screening or design. However, the availability of cheap ML energetics also allows going beyond scaling relations. On the basis of our own work in this direction, we will discuss the additional physical insight that can be achieved by integrating ML-based predictions with traditional catalysis modeling techniques from thermal and electrocatalysis, such as the computational hydrogen electrode and microkinetic modeling, as well as the challenges that lie ahead.

Machine-learning models for combinatorial catalyst discovery

Computational catalyst discovery: Active classification through myopic multiscale sampling

Machine Learning Descriptors for Data‐Driven Catalysis Study

Machine learning models predict calculation outcomes with the transferability necessary for computational catalysis

Probing machine learning models based on high throughput experimentation data for the discovery of asymmetric hydrogenation catalysts

Statistical Analysis and Discovery of Heterogeneous Catalysts Based on Machine Learning from Diverse Published Data

Adsorption Enthalpies for Catalysis Modeling through Machine-Learned Descriptors

Design of Experimental Conditions with Machine Learning for Collaborative Organic Synthesis Reactions Using Transition-Metal Catalysts

Open Challenges in Developing Generalizable Large Scale Machine Learning Models for Catalyst Discovery

Open Challenges in Developing Generalizable Large-Scale Machine-Learning Models for Catalyst Discovery

Towards Combinatorial Generalization for Catalysts: A Kohn-Sham Charge-Density Approach

Interpretable Catalysis Models Using Machine Learning with Spectroscopic Descriptors

Machine learning and DFT coupling: A powerful approach to explore organic amine catalysts for ring-opening polymerization reaction

Machine Learning to Develop Peptide Catalysts─Successes, Limitations, and Opportunities

Systematic Data-Driven Modeling of Bimetallic Catalyst Performance for the Hydrogenation of 5-Ethoxymethylfurfural with Variable Selection and Regularization

Unlocking Potential Catalysts: A Machine Learning Approach with Bayesian and Regression Models

Predicting outcomes of catalytic reactions using machine learning

Accelerated dinuclear palladium catalyst identification through unsupervised machine learning

Toward Next-Generation Heterogeneous Catalysts: Empowering Surface Reactivity Prediction with Machine Learning

Predicting reaction performance in C–N cross-coupling using machine learning

Automated transition metal catalysts discovery and optimisation with AI and Machine Learning