Abstract:Mitigating the climate crisis requires a rapid transition towards lower-carbon energy. Catalyst materials play a crucial role in the electrochemical reactions involved in numerous industrial processes key to this transition, such as renewable energy storage and electrofuel synthesis. To reduce the energy spent on such activities, we must quickly discover more efficient catalysts to drive electrochemical reactions. Machine learning (ML) holds the potential to efficiently model materials properties from large amounts of data, accelerating electrocatalyst design. The Open Catalyst Project OC20 dataset was constructed to that end. However, ML models trained on OC20 are still neither scalable nor accurate enough for practical applications. In this paper, we propose task-specific innovations applicable to most architectures, enhancing both computational efficiency and accuracy. This includes improvements in (1) the graph creation step, (2) atom representations, (3) the energy prediction head, and (4) the force prediction head. We describe these contributions, referred to as PhAST, and evaluate them thoroughly on multiple architectures. Overall, PhAST improves energy MAE by 4 to 42$\%$ while dividing compute time by 3 to 8$\times$ depending on the targeted task/model. PhAST also enables CPU training, leading to 40$\times$ speedups in highly parallelized settings. Python package: \url{<a class="link-external link-https" href="https://phast.readthedocs.io" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve the accuracy and computational efficiency of machine - learning models in accelerating catalyst design. Specifically, for the design of electro - catalyst materials, the paper proposes a new graph neural network (GNN) framework - PhAST (Physics - Aware, Scalable, and Task - Specific GNNs), aiming to improve model performance through the following improvements: 1. **Graph construction step**: A graph - construction method specifically for catalyst - adsorbate modeling is proposed. By removing or aggregating atomic nodes labeled as 0, the amount of computation is reduced without affecting the model's expressive ability. 2. **Atomic representation**: A richer physics - based atomic representation is introduced, including label information, atomic properties (such as atomic radius or density), and group and period information in the periodic table of elements, to enhance the learning of atomic features. 3. **Energy prediction head**: A weighted - sum node - representation method and a hierarchical pooling method are developed to predict energy from the final atomic representation, thus better considering atomic characteristics and graph topology. 4. **Force prediction head**: A method for directly predicting the force on atoms is proposed, and a gradient - objective loss term is introduced to encourage energy conservation while reducing memory usage and computation time. These improvements enable PhAST to be tested on multiple benchmark GNN architectures. The results show that PhAST not only improves the prediction accuracy of the mean - squared error (MAE) of energy but also significantly shortens the computation time, especially when dealing with large - scale data sets. In addition, PhAST can also enable CPU training, greatly increasing the training speed of the model in a highly parallel environment, with an acceleration of up to 40 times. These improvements are of great significance for promoting the design of electro - catalyst materials, helping to reduce carbon emissions and promote the development of clean energy.

PhAST: Physics-Aware, Scalable, and Task-specific GNNs for Accelerated Catalyst Design

Lightweight Geometric Deep Learning for Molecular Modelling in Catalyst Discovery

CatTSunami: Accelerating Transition State Energy Calculations with Pre-trained Graph Neural Networks

An Introduction to Electrocatalyst Design using Machine Learning for Renewable Energy Storage

The Open Catalyst 2020 (OC20) Dataset and Community Challenges

The Open Catalyst Challenge 2021: Competition Report.

Boosting Heterogeneous Catalyst Discovery by Structurally Constrained Deep Learning Models

Computational catalyst discovery: Active classification through myopic multiscale sampling

An Automated Pynta-based Curriculum for ML-Accelerated Calculation of Transition States

A Machine Learning and Explainable AI Framework Tailored for Unbalanced Experimental Catalyst Discovery

The Open MatSci ML Toolkit: A Flexible Framework for Machine Learning in Materials Science

Catalysis distillation neural network for the few shot open catalyst challenge

Explainable Data-driven Modeling of Adsorption Energy in Heterogeneous Catalysis

Accelerated Design of Nickel-Cobalt Based Catalysts for CO2 Hydrogenation with Human-in-the-Loop Active Machine Learning

Unlocking New Insights for Electrocatalyst Design: A Unique Data Science Workflow Leveraging Internet-Sourced Big Data

Adaptive Catalyst Discovery Using Multicriteria Bayesian Optimization with Representation Learning

Designing catalysts with deep generative models and computational data. A case study for Suzuki cross coupling reactions

AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials

A Universal Machine Learning Framework for Electrocatalyst Innovation: A Case Study of Discovering Alloys for Hydrogen Evolution Reaction

AutoMat: Accelerated Computational Electrochemical systems Discovery

Catalyst Property Prediction with CatBERTa: Unveiling Feature Exploration Strategies through Large Language Models