PhAST: Physics-Aware, Scalable, and Task-specific GNNs for Accelerated Catalyst Design

Alexandre Duval,Victor Schmidt,Santiago Miret,Yoshua Bengio,Alex Hernández-García,David Rolnick
2024-03-11
Abstract:Mitigating the climate crisis requires a rapid transition towards lower-carbon energy. Catalyst materials play a crucial role in the electrochemical reactions involved in numerous industrial processes key to this transition, such as renewable energy storage and electrofuel synthesis. To reduce the energy spent on such activities, we must quickly discover more efficient catalysts to drive electrochemical reactions. Machine learning (ML) holds the potential to efficiently model materials properties from large amounts of data, accelerating electrocatalyst design. The Open Catalyst Project OC20 dataset was constructed to that end. However, ML models trained on OC20 are still neither scalable nor accurate enough for practical applications. In this paper, we propose task-specific innovations applicable to most architectures, enhancing both computational efficiency and accuracy. This includes improvements in (1) the graph creation step, (2) atom representations, (3) the energy prediction head, and (4) the force prediction head. We describe these contributions, referred to as PhAST, and evaluate them thoroughly on multiple architectures. Overall, PhAST improves energy MAE by 4 to 42$\%$ while dividing compute time by 3 to 8$\times$ depending on the targeted task/model. PhAST also enables CPU training, leading to 40$\times$ speedups in highly parallelized settings. Python package: \url{<a class="link-external link-https" href="https://phast.readthedocs.io" rel="external noopener nofollow">this https URL</a>}.
Machine Learning,Computational Physics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the accuracy and computational efficiency of machine - learning models in accelerating catalyst design. Specifically, for the design of electro - catalyst materials, the paper proposes a new graph neural network (GNN) framework - PhAST (Physics - Aware, Scalable, and Task - Specific GNNs), aiming to improve model performance through the following improvements: 1. **Graph construction step**: A graph - construction method specifically for catalyst - adsorbate modeling is proposed. By removing or aggregating atomic nodes labeled as 0, the amount of computation is reduced without affecting the model's expressive ability. 2. **Atomic representation**: A richer physics - based atomic representation is introduced, including label information, atomic properties (such as atomic radius or density), and group and period information in the periodic table of elements, to enhance the learning of atomic features. 3. **Energy prediction head**: A weighted - sum node - representation method and a hierarchical pooling method are developed to predict energy from the final atomic representation, thus better considering atomic characteristics and graph topology. 4. **Force prediction head**: A method for directly predicting the force on atoms is proposed, and a gradient - objective loss term is introduced to encourage energy conservation while reducing memory usage and computation time. These improvements enable PhAST to be tested on multiple benchmark GNN architectures. The results show that PhAST not only improves the prediction accuracy of the mean - squared error (MAE) of energy but also significantly shortens the computation time, especially when dealing with large - scale data sets. In addition, PhAST can also enable CPU training, greatly increasing the training speed of the model in a highly parallel environment, with an acceleration of up to 40 times. These improvements are of great significance for promoting the design of electro - catalyst materials, helping to reduce carbon emissions and promote the development of clean energy.