RetroGNN: Approximating Retrosynthesis by Graph Neural Networks for De Novo Drug Design

Cheng-Hao Liu,Maksym Korablyov,Stanisław Jastrzębski,Paweł Włodarczyk-Pruszyński,Yoshua Bengio,Marwin H. S. Segler
DOI: https://doi.org/10.48550/arXiv.2011.13042
2020-11-26
Abstract:De novo molecule generation often results in chemically unfeasible molecules. A natural idea to mitigate this problem is to bias the search process towards more easily synthesizable molecules using a proxy for synthetic accessibility. However, using currently available proxies still results in highly unrealistic compounds. We investigate the feasibility of training deep graph neural networks to approximate the outputs of a retrosynthesis planning software, and their use to bias the search process. We evaluate our method on a benchmark involving searching for drug-like molecules with antibiotic properties. Compared to enumerating over five million existing molecules from the ZINC database, our approach finds molecules predicted to be more likely to be antibiotics while maintaining good drug-like properties and being easily synthesizable. Importantly, our deep neural network can successfully filter out hard to synthesize molecules while achieving a $10^5$ times speed-up over using the retrosynthesis planning software.
Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem that **the molecules generated in de novo drug design are difficult to synthesize**. Specifically, the existing de novo molecule generation methods often produce chemically infeasible or difficult - to - synthesize molecules, which limits their practical applications. To solve this problem, the authors proposed a method based on Graph Neural Networks (GNNs), namely **RetroGNN**, to approximate the output of retrosynthetic analysis software and apply it to de novo drug design. #### Main problems and solutions 1. **Problem of generating infeasible molecules**: - **Problem**: Traditional de novo molecule generation methods tend to produce a large number of chemically infeasible or difficult - to - synthesize molecules. - **Solution**: Train a deep Graph Neural Network (RetroGNN) to approximate the output of retrosynthetic planning software, thus biasing towards more easily synthesizable molecules during the generation process. 2. **Limitations of existing synthetic accessibility scoring methods**: - **Problem**: Existing synthetic accessibility scoring methods (such as SAScore and SCScore) are fast but not accurate enough and may still lead to the generation of unrealistic molecules. - **Solution**: Use RetroGNN to predict the output of retrosynthetic planning software, providing a more accurate synthetic accessibility score (RetroGNNScore) while significantly improving the computational speed. 3. **Design of antibiotic drugs**: - **Problem**: Finding drug - like molecules with antibiotic activity and easy to synthesize is a complex multi - objective optimization problem. - **Solution**: Optimize the search process by combining RetroGNNScore, QED (Quantitative Estimation of Drug - likeness) and antibiotic activity score to find molecules with both high antibiotic activity and easy to synthesize. ### Method overview - **Model training**: The authors trained a deep Graph Neural Network (RetroGNN) to predict the output of retrosynthetic planning software (Molecule.one), namely M1Score. M1Score is a score reflecting the synthesis difficulty of molecules, ranging from 1 to 10, and 11 indicates no synthesis path. - **Search space**: Three different molecular search spaces are defined, including fragment - based search space and graph - edit - based search space. - **Multi - objective optimization**: Through the softmax action selection strategy, the synthetic accessibility, drug - likeness and antibiotic activity of molecules are comprehensively considered in the optimization process, and finally the optimal molecules are found. ### Experimental results - **Speed improvement**: RetroGNNScore is about 10^5 times faster than M1Score and can process thousands of molecules per second on a single GPU. - **Accuracy verification**: The correlation between RetroGNNScore and M1Score is very good, and the R² value reaches between 0.976 and 0.995 in different search spaces. - **Antibiotic molecule discovery**: Through optimized search, RetroGNNScore has successfully found molecules with high antibiotic activity and easy to synthesize, which are superior to random search and known molecules in the ZINC database. In conclusion, this paper proposes an efficient and accurate method that can generate molecules with both high activity and easy to synthesize in de novo drug design, thus promoting the progress of new drug research and development.