Accurately estimating activation energies by leveraging neural network methods and a large dataset

Guo-Jin Cao

DOI: https://doi.org/10.26434/chemrxiv-2024-4qb7s

2024-09-04

Abstract:Determining activation energies is integral to the field of computational chemistry. With the emergence of artificial intelligence, new methodologies such as neural networks have been introduced to accelerate the prediction of these energies, representing a notable advancement in this scientific domain. By incorporating topological indices, molecular fingerprints of reactants and products, and reaction enthalpy as descriptors, a deep-learning framework was developed. This framework utilizes the Reaction Graph Depth 1 (RGD1) dataset, which includes 176,992 organic reactions, to accurately estimate activation energies using artificial neural networks. The results demonstrated training R² values of 0.99, with a mean absolute error (MAE) of 2.06 kcal/mol and a root mean square error (RMSE) of 3.20 kcal/mol across an activation energy range of nearly 200 kcal/mol. These results exceed the accuracy of the other models on the same dataset as well as different datasets. Based on the learning curve, the training and validation losses were nearly identical and minimized, suggesting that the model was effectively regularized. The Chemprop model, with optimized hyperparameters, reached an R² of 0.93 on the test set, which is slightly below the performance of the previously discussed ANN method.

Chemistry

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to accurately predict the activation energy of chemical reactions using neural network methods and large - scale data sets in the field of computational chemistry. Specifically, by introducing topological indices, molecular fingerprints of reactants and products, and reaction enthalpy as descriptors, the author developed a deep - learning framework aiming to improve the accuracy of activation energy prediction. This research is of great significance for understanding chemical reaction behavior, designing drugs, and innovating catalysts. The main contributions of the paper are as follows: 1. **Use of data sets**: The Reaction Graph Depth 1 (RGD1) data set containing 176,992 organic reactions was used. 2. **Model performance**: The \( R^2 \) value of the model on the training set reached 0.99, and the \( R^2 \) value on the test set reached 0.98. The root - mean - square error (RMSE) was 4.15 kcal/mol, and the mean absolute error (MAE) was 2.62 kcal/mol. 3. **Generalization ability**: The model also showed good generalization ability on external test sets (such as the GPOC data set), further verifying its applicability on different data sets. Through these methods, the paper shows how to use machine - learning techniques to significantly improve the accuracy of activation energy prediction, thereby reducing the high computational cost of traditional quantum - chemical calculations.

Accurately estimating activation energies by leveraging neural network methods and a large dataset

Precise estimation of activation energies in gas-phase chemical reactions via artificial neural network

Enhancing Activation Energy Predictions under Data Constraints Using Graph Neural Networks

Graph to Activation Energy Models Easily Reach Irreducible Errors but Show Limited Transferability

A Neural Network Protocol for Predicting Molecular Bond Energy

CatTSunami: Accelerating Transition State Energy Calculations with Pre-trained Graph Neural Networks

CoeffNet : predicting activation barriers through a chemically-interpretable, equivariant and physically constrained graph neural network

Machine Learning-Based Prediction of Activation Energies for Chemical Reactions on Metal Surfaces

The importance of reaction energy in predicting chemical reaction barriers with machine learning models

Bond Energies from a Diatomics-in-Molecules Neural Network

Transition State Searching Accelerated by Deep Learning Potential

Beyond Independent Error Assumptions in Large GNN Atomistic Models

Ab initio Accuracy Neural Network Potential for Drug-like Molecules

Prediction of energies for reaction intermediates and transition states on catalyst surfaces using graph-based machine learning models

Quantum Mechanics and Machine Learning Synergies: Graph Attention Neural Networks to Predict Chemical Reactivity

Neural Network Based in Silico Simulation of Combustion Reactions

Hard-threshold-Neural-Network based Prediction of Organic Synthetic Outcomes

AI-enhanced chemical paradigm: From molecular graphs to accurate prediction and mechanism

Applying Machine Learning Algorithms to Predict Potential Energies and Atomic Forces during C-H Activation