Multi-view deep learning based molecule design and structural optimization accelerates the SARS-CoV-2 inhibitor discovery

Chao Pang,Yu Wang,Yi Jiang,Ruheng Wang,Ran Su,Leyi Wei
DOI: https://doi.org/10.48550/arXiv.2212.01575
2022-12-03
Abstract:In this work, we propose MEDICO, a Multi-viEw Deep generative model for molecule generation, structural optimization, and the SARS-CoV-2 Inhibitor disCOvery. To the best of our knowledge, MEDICO is the first-of-this-kind graph generative model that can generate molecular graphs similar to the structure of targeted molecules, with a multi-view representation learning framework to sufficiently and adaptively learn comprehensive structural semantics from targeted molecular topology and geometry. We show that our MEDICO significantly outperforms the state-of-the-art methods in generating valid, unique, and novel molecules under benchmarking comparisons. In particular, we showcase the multi-view deep learning model enables us to generate not only the molecules structurally similar to the targeted molecules but also the molecules with desired chemical properties, demonstrating the strong capability of our model in exploring the chemical space deeply. Moreover, case study results on targeted molecule generation for the SARS-CoV-2 main protease (Mpro) show that by integrating molecule docking into our model as chemical priori, we successfully generate new small molecules with desired drug-like properties for the Mpro, potentially accelerating the de novo design of Covid-19 drugs. Further, we apply MEDICO to the structural optimization of three well-known Mpro inhibitors (N3, 11a, and GC376) and achieve ~88% improvement in their binding affinity to Mpro, demonstrating the application value of our model for the development of therapeutics for SARS-CoV-2 infection.
Machine Learning,Biomolecules
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to accelerate the discovery of SARS - CoV - 2 inhibitors through a multi - view deep - learning model. Specifically: 1. **Molecule Generation and Structural Optimization**: The paper proposes an innovative deep - generative model named MEDICO (Multi - viEw Deep generative model for molecule generation, structural optimization, and the SARS - CoV - 2 Inhibitor disCOvery). This model aims to generate new molecules similar to the target molecule structure and optimize the chemical properties of these molecules, thereby accelerating new drug design. 2. **Combining Multi - view Representation Learning Framework**: MEDICO introduces the multi - view representation learning framework for the first time, which can fully and adaptively learn comprehensive structural semantics from the topological structure and geometric information of the target molecule. This enables the model to generate molecules that are not only structurally similar but also have the required chemical properties, demonstrating its powerful ability to explore the chemical space in - depth. 3. **Design of SARS - CoV - 2 Main Protease (Mpro) Inhibitors**: By integrating the molecular docking results as chemical prior information into the model, MEDICO successfully generates new small molecules with ideal drug properties, especially for the main protease (Mpro) of SARS - CoV - 2, thus accelerating the de novo design of Covid - 19 drugs. 4. **Improving the Binding Affinity of Known Inhibitors**: The paper also applies MEDICO to perform structural optimization on three known Mpro inhibitors (N3, 11a, and GC376), achieving an approximately 88% increase in binding affinity, further proving the application value of this model in the development of drugs for treating SARS - CoV - 2 infection. ### Key Technical Points - **Multi - view Representation Learning**: MEDICO combines the topological structure and geometric information of molecules. It processes 2D topological structures through a flow - based model and processes 3D geometric information through a message - passing neural network (MPNN), and finally generates a joint representation. - **Molecular Docking as Chemical Prior**: Utilize the results of molecular docking (such as binding energy) to provide guidance for the model, ensuring that the generated molecules have potential antiviral activity. - **Performance Evaluation**: Evaluate the quality of the generated molecules through indicators such as effectiveness, novelty, and uniqueness, and evaluate the inhibitory effect of the generated molecules on Mpro through molecular docking scores. In conclusion, MEDICO aims to address the shortcomings of traditional molecule - generation methods in terms of structural similarity and chemical property prediction, providing new tools and methods for accelerating the discovery of SARS - CoV - 2 inhibitors.