Generation of 3D Molecules in Pockets via Language Model

Wei Feng,Lvwei Wang,Zaiyun Lin,Yanhao Zhu,Han Wang,Jianqiang Dong,Rong Bai,Huting Wang,Jielong Zhou,Wei Peng,Bo Huang,Wenbiao Zhou
2023-12-11
Abstract:Generative models for molecules based on sequential line notation (e.g. SMILES) or graph representation have attracted an increasing interest in the field of structure-based drug design, but they struggle to capture important 3D spatial interactions and often produce undesirable molecular structures. To address these challenges, we introduce Lingo3DMol, a pocket-based 3D molecule generation method that combines language models and geometric deep learning technology. A new molecular representation, fragment-based SMILES with local and global coordinates, was developed to assist the model in learning molecular topologies and atomic spatial positions. Additionally, we trained a separate noncovalent interaction predictor to provide essential binding pattern information for the generative model. Lingo3DMol can efficiently traverse drug-like chemical spaces, preventing the formation of unusual structures. The Directory of Useful Decoys-Enhanced (DUD-E) dataset was used for evaluation. Lingo3DMol outperformed state-of-the-art methods in terms of drug-likeness, synthetic accessibility, pocket binding mode, and molecule generation speed.
Machine Learning,Biomolecules
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to address the issues encountered in generating 3D molecules in structured drug design. Specifically, existing methods based on sequence representations (such as SMILES) or graph representations fall short in capturing important 3D spatial interactions and often produce suboptimal molecular structures. To tackle these issues, the authors propose Lingo3DMol, a pocket-based 3D molecular generation method that combines language models and geometric deep learning techniques. ### Main Contributions 1. **Novel Molecular Representation (FSMILES)**: A new molecular representation method called FSMILES is introduced, which combines local and global coordinates, enabling the generated 3D molecules to have reasonable 3D conformations and 2D topological structures. 2. **3D Molecular Denoising Pretraining Method**: An independent non-covalent interaction (NCI)/anchor point prediction model is developed to help overcome the problem of limited data and identify potential NCI binding sites. 3. **Outperformance of Existing Methods**: Lingo3DMol outperforms current state-of-the-art methods on various metrics, including drug similarity, synthetic accessibility, and pocket binding patterns. ### Experimental Results and Discussion 1. **Molecular Geometry Evaluation**: By comparing with reference molecules, it is found that Lingo3DMol has the lowest Jensen-Shannon divergence (JSD) score in atomic distance distribution and performs better in ring size, reducing the probability of generating macrocycles larger than 7 atoms. 2. **Molecular Properties and Binding Mode Evaluation**: Using Glide scoring to evaluate the generated molecules, Lingo3DMol excels in min-in-place GlideSP scores and RMSD vs. low-energy conformers. Additionally, it performs well in generating molecules similar to known active compounds. 3. **Information Leakage Analysis**: An analysis of information leakage issues in benchmark models reveals that Lingo3DMol consistently outperforms Pocket2Mol under different levels of information leakage. ### Conclusion Lingo3DMol demonstrates excellent performance in generating drug-like 3D molecules and surpasses existing methods on multiple key metrics, showcasing its potential in drug discovery and design. Future research directions include further optimizing the capture of non-covalent interactions and improving the model's equivariance.