Molecular conformer search with low-energy latent space

Xiaomi Guo,Lincan Fang,Yong Xu,Wenhui Duan,Rinke Patrick,Milica Todorović,Xi Chen
DOI: https://doi.org/10.48550/arXiv.2203.14012
2022-03-26
Abstract:Identifying low-energy conformers with quantum mechanical accuracy for molecules with many degrees of freedom is challenging. In this work, we use the molecular dihedral angles as features and explore the possibility of performing molecular conformer search in a latent space with a generative model named variational auto-encoder (VAE). We bias the VAE towards low-energy molecular configurations to generate more informative data. In this way, we can effectively build a reliable energy model for the low-energy potential energy surface. After the energy model has been built, we extract local-minimum conformations and refine them with structure optimization. We have tested and benchmarked our low-energy latent-space (LOLS) structure search method on organic molecules with $5-9$ searching dimensions. Our results agree with previous studies.
Computational Physics,Disordered Systems and Neural Networks,Chemical Physics
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve The paper aims to address the problem of identifying low-energy conformers in molecules with a large number of degrees of freedom. Specifically, the authors propose a new method that utilizes Variational Auto-Encoder (VAE) to perform molecular conformation search in the latent space, generating more low-energy data and constructing a reliable low-energy potential energy surface model. This method can effectively reduce computational costs and improve search efficiency. ### Main Research Background 1. **Challenges of Molecular Conformations**: - Organic molecules are usually very flexible, and any molecule with rotatable bonds can adopt multiple energetically accessible conformations, each associated with different chemical and electronic properties. - Identifying low-energy molecular conformations and determining their energy ranking is an important topic in computational chemistry, cheminformatics, computational drug design, and structure-based virtual screening. - As the size of the molecule increases, the dimensionality of the conformation space and the complexity of the energy landscape increase dramatically, making molecular conformation search a persistent challenge in molecular modeling. 2. **Limitations of Existing Methods**: - Systematic methods sample all possible torsion angles through grid sampling, but this approach is only suitable for small molecules as the computational cost rises rapidly with increasing search dimensions. - Stochastic methods such as Monte Carlo annealing, minima hopping, basin hopping, and genetic algorithms can be applied in high-dimensional search spaces, but due to the randomness of the process, a large number of samples are required to obtain convergent results. - Hierarchical methods first scan most of the conformation space and then optimize promising candidate conformations with more expensive and accurate calculations, but different levels of simulation accuracy may predict different potential energy surfaces (PES), thus still requiring optimization of many structures at higher accuracy levels. 3. **Applications of Machine Learning**: - In recent years, machine learning techniques such as artificial neural networks, Gaussian Process Regression (GPR), and machine learning force fields have been successfully applied to accelerate the prediction of molecular structure to energy and geometry optimization. - However, these schemes usually require training on large datasets, which often need expensive calculations using ab initio methods. ### Proposed Method 1. **Low-Energy Latent Space (LOLS) Structure Search Method**: - Utilize molecular dihedral angles as features to perform molecular conformation search in the latent space. - Use Variational Auto-Encoder (VAE) to generate samples and guide the VAE towards low-energy molecular conformations through a regularization term in the loss function. - After generating samples in the latent space, decode them back to the actual space and calculate DFT energy. - Fit the energy model in the actual space using a Gaussian Process Regression (GP) model, extract local minimum conformations, and perform structural optimization. 2. **Experimental Validation**: - The LOLS method was tested on cysteine and four peptides (WG, GFA, GGF, WGG). - These molecules were chosen because amino acids and peptides are important biomolecules, peptides are very flexible and have complex potential energy surfaces, making them suitable challenge systems for conformation search, and there is reference data from previous studies. ### Results and Discussion 1. **VAE Training Process and Sample Analysis**: - The VAE training process for cysteine was analyzed, showing that the training loss, latent space scale, and average sample energy were all within reasonable ranges. - The data distribution in the latent space was uniform, and as the β value decreased, low-energy regions gradually formed. 2. **Correspondence Between Latent Space and Target Conformations**: - The latent space was discretized into a 400×400 grid, and all points were decoded back to the actual space and assigned to target conformations. - The results showed that the island areas in the latent space were 6.1%, 3.7%, and 7.3% (β=0, -1, -3), respectively. - The low-energy regions covered 36.2%, 26.6%, and 36.8% of the latent space (β=0, -1, -3). 3. **Performance Evaluation of the LOLS Method**: - In nine parallel runs, the results with β=-3 were the best, and the results with β=0 were the worst, recommending the use of negative β values. - The LOLS method in cysteine...