Conf-GEM: a geometric information-assisted direct conformation generation model

Zhijiang Yang,Youjun Xu,Li Pan,Tengxin Huang,Yunfan Wang,Junjie Ding,Liangliang Wang,Junhua Xiao
DOI: https://doi.org/10.1016/j.aichem.2024.100074
2024-07-01
Abstract:Molecular conformations generation (MCG) aims to efficiently obtain reasonable and stable three-dimensional (3D) atomic coordinates of the atoms in the molecule from scratch, providing a structural foundation for molecular representation learning models and advanced downstream molecular design tasks such as molecular property prediction, molecular generation, and molecular docking. Existing MCG methods mostly rely on indirect distance-based strategies, which which can result in geometrically unrealistic conformations, or direct coordinate-based methods, which have larger search spaces and are prone to overfitting. Therefore, this study introduces Conf-GEM, a novel geometric information-assisted direct conformation generation model based on E-GeoGNN, a geometrically augmented 3D graph neural network with multiple scales. Pre-training and divide-and-conquer strategies, are integrated into the proposed model. Conf-GEM outperforms RDKit and nine deep-learning-based MCG models on the GEOM-QM9 and GEOM-Drugs datasets, achieving conformational coverage of 96.69% and 96.07%, respectively, without force field optimization. It also excels on the X-ray diffraction crystal structure dataset with up to 97.04% conformational coverage. In conclusion, Conf-GEM provides a novel solution for stabilizing 3D conformations generation. We provide an online prediction service (https://confgem.cmdrg.com) with a user-friendly interface for researchers.
Computer Science,Chemistry
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address the key issues in Molecular Conformation Generation (MCG). Specifically, it proposes a new geometric information-assisted direct conformation generation model—Conf-GEM. This model is based on an enhanced 3D Graph Neural Network (E-GeoGNN) and combines pre-training and divide-and-conquer strategies to efficiently generate reasonable and stable 3D atomic coordinates of molecules. ### Background and Motivation 1. **Importance of Molecular Structure**: - The 3D structure of molecules, especially the most stable equilibrium geometric configuration, plays a crucial role in determining the quantum, physicochemical, and biological activities of molecules. - For example, isomers with the same atomic composition may exhibit significantly different melting points due to differences in their 3D structures. 2. **Limitations of Existing Methods**: - **Indirect Distance-Based Methods**: These methods may lead to geometrically unrealistic conformations. - **Direct Coordinate-Based Methods**: These methods have a large search space and are prone to overfitting. - **Experimental Methods**: Methods such as X-ray crystallography, cryo-electron microscopy, and nuclear magnetic resonance spectroscopy are time-consuming and expensive. - **Computational Methods**: Quantum Mechanics (QM) methods are computationally expensive, while Molecular Mechanics (MM) methods have low accuracy. 3. **Shortcomings of Existing Models**: - **CVGAE**: Although it can directly generate molecular conformations, it ignores the rotational and translational invariance of molecular conformations. - **Distance Geometry-Based Methods**: Such as ETKDG, the generated conformations are of low quality. - **Commercial Software**: Such as OMEGA, although efficient, cannot capture the global interactions between different dihedral angles and is not suitable for handling large molecular systems. ### Innovations of Conf-GEM 1. **Geometric Information Assistance**: By introducing dihedral angle information, Conf-GEM can generate molecular conformations more accurately. 2. **Pre-Training**: Self-supervised learning on large-scale datasets ensures the model has excellent conformation discrimination capabilities. 3. **Divide-and-Conquer Strategy**: Gradually training bond lengths, bond angles, and dihedral angles reduces the model's search space and improves the rationality of generated conformations. 4. **Rotational, Translational, and Permutation Invariance**: Through the design of the loss function, it ensures that the generated conformations remain invariant under rotation, translation, and atomic permutation operations. ### Experimental Results 1. **Performance Evaluation**: - On the GEOM-QM9 and GEOM-Drugs datasets, Conf-GEM achieved conformation coverage rates of 96.69% and 96.07%, respectively, outperforming 9 existing deep learning-based MCG models. - On the X-ray diffraction crystal structure dataset, Conf-GEM achieved a conformation coverage rate of 97.04%. 2. **Model Stability**: - Whether or not force field optimization is performed, the performance of Conf-GEM changes little, indicating that the model itself generates conformations with small errors and is not prone to overfitting. ### Conclusion Conf-GEM provides an innovative solution that can generate high-quality, diverse 3D molecular conformations while ensuring computational efficiency. The model's excellent performance on multiple datasets demonstrates its potential in practical applications. Additionally, the research team has developed an online prediction service platform to facilitate researchers' use.