Riemannian Denoising Score Matching for Molecular Structure Optimization with Accurate Energy

Jeheon Woo,Seonghwan Kim,Jun Hyeong Kim,Woo Youn Kim
2024-11-29
Abstract:This study introduces a modified score matching method aimed at generating molecular structures with high energy accuracy. The denoising process of score matching or diffusion models mirrors molecular structure optimization, where scores act like physical force fields that guide particles toward equilibrium states. To achieve energetically accurate structures, it can be advantageous to have the score closely approximate the gradient of the actual potential energy surface. Unlike conventional methods that simply design the target score based on structural differences in Euclidean space, we propose a Riemannian score matching approach. This method represents molecular structures on a manifold defined by physics-informed internal coordinates to efficiently mimic the energy landscape, and performs noising and denoising within this space. Our method has been evaluated by refining several types of starting structures on the QM9 and GEOM datasets, demonstrating that the proposed Riemannian score matching method significantly improves the accuracy of the generated molecular structures, attaining chemical accuracy. The implications of this study extend to various applications in computational chemistry, offering a robust tool for accurate molecular structure prediction.
Machine Learning,Chemical Physics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve energy accuracy in molecular structure optimization. Traditional molecular structure prediction methods, such as density functional theory (DFT), can provide relatively accurate energy predictions, but they have high computational costs and are difficult to extend to large - scale systems. Machine - learning methods, especially those based on denoising score matching (DSM), have made significant progress in molecular structure prediction, but have not yet reached chemical accuracy (usually defined as an energy error of less than 1 kcal/mol). This limits their reliability in applications that require high accuracy. To solve this problem, the authors propose a new method - Riemannian denoising score matching (R - DSM). This method captures the characteristics of the molecular energy surface more accurately by performing noise sampling and denoising on the Riemannian manifold and using physically - informed internal coordinates to represent molecular structures. Specifically, the R - DSM method is implemented through the following steps: 1. **Design of the physically - informed Riemannian manifold**: Use physically - informed internal coordinates (such as bond lengths, bond angles, etc.) to represent molecular structures and define the corresponding Riemannian metric tensor \( g_{ij} \) and distance element \( ds^2 \): \[ g_{ij}=\frac{\partial q_e}{\partial x_i}\frac{\partial q_e}{\partial x_j} \] \[ ds^2 = g_{ij}dx_i dx_j \] where \( x_i \) and \( x_j \) are the mass - weighted Cartesian coordinates of the \( i \) - th and \( j \) - th atoms, and \( q_e \) is the representation of the physically - informed internal coordinates. 2. **Noise sampling on the Riemannian manifold**: Different from the traditional noise sampling in Euclidean space, R - DSM generates noisy structures on the Riemannian manifold, using Brownian motion approximation: \[ x_t=\exp_{x_0}(\sigma_t\epsilon_t) \] where \( x_0 \) is the initial structure, \( \sigma_t \) is the noise scale at time \( t \), \( \epsilon_t\sim\mathcal{N}(0, I) \) is a Gaussian noise vector, and \( \exp_{x_0} \) is the exponential map that transports the noise vector along the geodesics of the manifold. 3. **Riemannian denoising score matching**: Train the model to learn the score function on the Riemannian manifold, which is closer to the gradient of the actual potential energy surface, so as to generate molecular structures more accurately. 4. **Performance evaluation**: The authors conducted experiments on the QM9 and GEOM datasets to evaluate the performance of R - DSM in molecular structure prediction and conformation generation tasks. The results show that R - DSM is superior to the traditional DSM method in terms of both structural accuracy and energy accuracy and has reached chemical accuracy. In conclusion, this paper aims to improve the energy accuracy of molecular structure optimization by introducing the Riemannian denoising score matching method, thereby providing more reliable molecular structure prediction tools in fields such as drug discovery and materials science.