Diffusion-based generative AI for exploring transition states from 2D molecular graphs

Seonghwan Kim,Jeheon Woo,Woo Youn Kim
DOI: https://doi.org/10.1038/s41467-023-44629-6
IF: 16.6
2024-01-06
Nature Communications
Abstract:Abstract The exploration of transition state (TS) geometries is crucial for elucidating chemical reaction mechanisms and modeling their kinetics. Recently, machine learning (ML) models have shown remarkable performance for prediction of TS geometries. However, they require 3D conformations of reactants and products often with their appropriate orientations as input, which demands substantial efforts and computational cost. Here, we propose a generative approach based on the stochastic diffusion method, namely TSDiff, for prediction of TS geometries just from 2D molecular graphs. TSDiff outperforms the existing ML models with 3D geometries in terms of both accuracy and efficiency. Moreover, it enables to sample various TS conformations, because it learns the distribution of TS geometries for diverse reactions in training. Thus, TSDiff finds more favorable reaction pathways with lower barrier heights than those in the reference database. These results demonstrate that TSDiff shows promising potential for an efficient and reliable TS exploration.
multidisciplinary sciences
What problem does this paper attempt to address?
The main objective of this paper is to propose a generative artificial intelligence method based on the diffusion process (referred to as TSDiff) for predicting the geometric structure of transition states (TS) from two-dimensional (2D) molecular graphs. The exploration of transition states is crucial for understanding chemical reaction mechanisms and simulating their dynamics. Traditional transition state optimization methods and existing machine learning (ML) models often require three-dimensional (3D) structural information as input when predicting transition state geometries, which not only consumes a lot of computational resources but is also very sensitive to the input structures. The TSDiff model proposed in the paper aims to address the following issues: 1. **Reduce the need for 3D structural information**: Most existing models require 3D structural information as input, including the proper orientation of reactants and products, which is a challenge for users. TSDiff only requires 2D molecular graphs as input, thus avoiding the need for 3D structure preparation. 2. **Improve prediction accuracy and efficiency**: Compared to traditional methods that rely on 3D geometries, TSDiff excels in both accuracy and efficiency. 3. **Generate multiple possible transition state conformations**: By learning the distribution of transition state geometries, TSDiff can generate different transition state conformations, helping to find more optimal reaction pathways with lower energy barriers. Specifically, TSDiff utilizes a stochastic diffusion method to directly predict transition state geometries from 2D molecular graphs without the need for 3D structural information. This approach not only reduces the workload of input preparation for users but also enhances the efficiency of the generation process. Additionally, since TSDiff can generate multiple transition state conformations, it aids in discovering reaction pathways with lower energy barriers, which is very important for understanding and designing chemical reactions.