Predicting Molecular Trajectories using Machine Learning Methods

Sonny Young
DOI: https://doi.org/10.26434/chemrxiv-2023-jh199
2023-12-18
Abstract:Predicting molecular trajectories is a cornerstone of computational chemistry, with implications for drug discovery and molecular dynamics simulations. This study presents a comprehensive analysis of various machine learning models for the prediction of aspirin molecular trajectories, as captured in a dataset of 1500 frames calculated via the CCSD [Psi4, cc-pVDZ] method. We explore statistical sampling methods, including random walk and Monte Carlo Markov Chain (MCMC), alongside a suite of neural networks comprising feed-forward, recurrent neural network (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU) architectures. Additionally, we investigate the potential of Equivariant Neural Networks (E3NN) to enforce permutation and rototranslational invariance, as well as Graph Convolutional Networks (GCN) for leveraging the inherent graph structure of molecules. Our results highlight the comparative effectiveness of these methods, with GCNs unexpectedly outperforming others in trajectory prediction accuracy. The study also delves into the novel application of diffusion models, treating molecular pose prediction as a generative problem, despite challenges in maintaining physical plausibility. Though preliminary, our findings underscore the promise of graph-based methods in capturing molecular interactions and dynamics, paving the way for future advancements in efficient and accurate trajectory prediction in computational chemistry.
Chemistry
What problem does this paper attempt to address?
This paper discusses how to use machine learning methods to predict molecular trajectories, which refers to the conformational changes of molecules over time. The authors are inspired by the recent use of generative diffusion models in molecular docking problems and propose a machine learning-based framework aimed at predicting molecular dynamic behaviors, thereby reducing the reliance on traditional computationally intensive methods such as molecular dynamics. The research method involves the use of the Symmetric Gradient Domain Machine Learning (sGMDL) dataset, specifically analyzing a 1500-frame trajectory of aspirin molecules. By dividing the training and testing data, the authors attempt to develop a machine learning model that can predict the final 150 frames with the lowest root mean square deviation (RMSD). Several different methods are compared in the paper: 1. Statistical sampling methods (random walk and Monte Carlo Markov Chain) perform poorly in predicting molecular structural changes, with high RMSD. 2. Neural networks and sequence models (such as simple feed-forward networks, recurrent neural networks RNN, LSTM, and GRU), RNN performs the best, especially for handling time-related data. 3. Equivariant neural networks (E3NN) and graph convolutional networks (GCN) also show good predictive abilities, with E3NN excelling in preserving geometric consistency, while GCN is good at capturing local interactions. 4. As a novel method, diffusion models can generate new data points, but face challenges in maintaining physical rationality and complex molecular dynamics. The conclusion points out that despite the difficulties, machine learning models, especially graph-based methods, have the potential for representing molecules and predicting molecular trajectories. Future work will focus on developing more accurate models to advance computational chemistry and drug discovery fields.