Precise estimation of activation energies in gas-phase chemical reactions via artificial neural network

Guo-Jin Cao,Sheng-Jie Lu
DOI: https://doi.org/10.26434/chemrxiv-2024-10h93
2024-07-30
Abstract:Various machine learning (ML) models are presented in this study, aiming to forecast the barrier heights (BHs) of gas-phase chemical reactions. The input features utilized in six distinct models were obtained from the structural and thermodynamic attributes of molecules, encompassing enthalpy, topological indices, and Morgan fingerprints derived from SMILES, using a dataset consisting of 5040 decomposition reaction records sourced from the Gas Phase Organic Chemistry database. Evaluating the effectiveness of the models included the application of essential metrics such as coefficient of determination, mean absolute error, and root mean square error. It is worth noting that artificial neural networks outperform the other models in this regard. Then we utilized Morgan fingerprints of different dimensions as inputs for the neural network models and conducted training with varying numbers of hidden layers. This endeavor led to slight improvements in the performance of gas-phase decomposition reactions, resulting in an average determination coefficient of 0.965 and a mean absolute error of 0.079 eV. Subsequently, the model was subjected to retraining using a comprehensive dataset comprising a wide range of chemical reactions. The results indicate that the artificial neural network approach has the capacity to generalize and adjust to a wider range of chemical reactions.
Chemistry
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to accurately predict the barrier heights (BHs) in gas - phase chemical reactions through machine - learning models, especially artificial neural networks (ANN)**. ### Problem Background Activation energy is one of the key factors that determine the rate, mechanism, and outcome of chemical reactions. Although traditional quantum - mechanical methods (such as density functional theory, DFT) can accurately calculate activation energy, these methods require a large amount of computational resources and time, especially when dealing with complex chemical systems. Therefore, researchers hope to explore a more efficient and widely applicable method to predict activation energy. ### Research Objectives 1. **Develop an efficient machine - learning model**: Use machine - learning techniques, especially artificial neural networks, to construct a model that can quickly and accurately predict the activation energy of gas - phase chemical reactions. 2. **Evaluate the performance of different models**: Compare the performance of multiple machine - learning models (such as random forest, support vector regression, etc.) in predicting activation energy and find the optimal model. 3. **Optimize the neural network structure**: Further improve the prediction accuracy of the model by adjusting the number of hidden layers and input features (such as the dimension of Morgan fingerprints) of the neural network. ### Data Sources and Features This study used 5,040 decomposition reaction records from the Gas Phase Organic Chemistry (GPOC) database. Input features include: - Molecular structure properties (such as topological indices) - Thermodynamic properties (such as enthalpy values) - Morgan fingerprints (extracted from SMILES strings) ### Main Results - **The artificial neural network model performs excellently**: Among all the tested machine - learning models, the artificial neural network model has the best performance, with an average coefficient of determination \( R^2 \) reaching 0.965 and an average absolute error (MAE) of 0.079 eV. - **Optimization of the neural network structure**: By adjusting the number of hidden layers and the dimension of Morgan fingerprints, the performance of the model has been further improved. - **Analysis of important features**: Through SHAP graph analysis, it is found that enthalpy values, Morgan fingerprints, and topological indices are important features that affect the model output, and the characteristics of the products have a more significant impact on the model. In conclusion, this study shows the great potential of artificial neural networks in predicting the activation energy of gas - phase chemical reactions and provides valuable references for future research.