Diffusion-based Generative AI for Exploring Transition States from 2D Molecular Graphs

Seonghwan Kim,Jeheon Woo,Woo Youn Kim
2023-10-12
Abstract:The exploration of transition state (TS) geometries is crucial for elucidating chemical reaction mechanisms and modeling their kinetics. Recently, machine learning (ML) models have shown remarkable performance for prediction of TS geometries. However, they require 3D conformations of reactants and products often with their appropriate orientations as input, which demands substantial efforts and computational cost. Here, we propose a generative approach based on the stochastic diffusion method, namely TSDiff, for prediction of TS geometries just from 2D molecular graphs. TSDiff outperformed the existing ML models with 3D geometries in terms of both accuracy and efficiency. Moreover, it enables to sample various TS conformations, because it learned the distribution of TS geometries for diverse reactions in training. Thus, TSDiff was able to find more favorable reaction pathways with lower barrier heights than those in the reference database. These results demonstrate that TSDiff shows promising potential for an efficient and reliable TS exploration.
Chemical Physics,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The main goal of this paper is to propose a generative artificial intelligence method based on the diffusion process (referred to as TSDiff) for predicting the geometric structure of transition states (TS) from 2D molecular graphs. Traditional methods require 3D molecular conformations as input, which not only consume a lot of computational resources but are also very sensitive to the input conformations. To address these issues, the authors developed TSDiff, which can generate the geometric structure of transition states using only 2D molecular graphs as input. Specifically, the paper addresses the following key issues: 1. **Simplifying input requirements**: Most existing machine learning models require 3D conformations as input when predicting transition states, whereas TSDiff can work with just 2D molecular graphs. 2. **Improving efficiency and accuracy**: By using a diffusion model, TSDiff can efficiently generate multiple possible transition state conformations, and these conformations are more accurate than those produced by existing models. 3. **Exploring favorable reaction pathways**: TSDiff can find transition states with lower energy barriers than those in reference databases, which means it can discover more favorable chemical reaction pathways. The experimental results in the paper show that TSDiff can not only generate conformations that match the transition states in the reference dataset but also find new transition state conformations with lower energy. This is significant for understanding chemical reaction mechanisms and designing catalysts. Additionally, TSDiff surpasses existing machine learning models based on 3D conformation input in terms of accuracy.