DrugSynthMC: an atom based generation of drug-like molecules with Monte Carlo Search

Filippo Prischi,Milo Roucairol,Alexios Georgiou,Tristan Cazenave,Olivier E. Pardo
DOI: https://doi.org/10.26434/chemrxiv-2024-l2969
2024-05-29
Abstract:A growing number of Deep Learning (DL) methodologies have recently been developed to design novel compounds and expand the chemical space within virtual libraries. Most of these Neural Network approaches design molecules to specifically bind a target, based on its structural information and/or knowledge of previously identified binders. Fewer attempts have been made to develop approaches for de novo design of virtual libraries, as synthesizability of generated molecules remains a challenge. In this work, we developed a new Monte Carlo Search (MCS) algorithm, DrugSynthMC (Drug Synthetise using Monte Carlo), in conjunction with DL and statistical-based priors to generate thousands of interpretable chemical structures and novel drug-like molecules per second. DrugSynthMC produces drug-like compounds using an atom-based search model that builds molecules as SMILES, character by character. Designed molecules follow Lipinski’s “rule of 5”, show a high proportion of predicted-to-be synthesisable compounds and efficiently expand the chemical space within the libraries, without reliance on training datasets, synthesizability metrics or enforcing during SMILES generation. Our approach can function with or without an underlying Neural Network and is thus easily explainable and versatile. This ease in drug-like molecule generation allows for future integration of score functions aimed at different target- or job -oriented goals. Thus, DrugSynthMC is expected to enable the functional assessment of large compound libraries covering an extensive novel chemical space, overcoming the limitations of existing drug collections. The software is available at https://github.com/RoucairolMilo/DrugSynthMC
Chemistry
What problem does this paper attempt to address?
This paper presents a new method called DrugSynthMC, which utilizes the Monte Carlo Search algorithm along with deep learning and statistical prior knowledge to rapidly generate interpretable chemical structures and drug-like molecules. Currently, despite the use of many deep learning methods for designing compounds targeting specific targets, the feasibility challenge of synthesis limits the de novo design of virtual libraries. DrugSynthMC generates SMILES strings based on atom-based search models, adhering to Lipinski's "rule of 5" and predicting a large number of synthesizable compounds, effectively expanding the chemical space without relying on training datasets or forcibly considering synthesis during the generation process. The uniqueness of this method lies in its not being limited to neural networks, making it easier to interpret and more flexible to adapt to different targets or task-specific compound libraries. The paper compares several different search algorithms (such as UCT and NMCS) as well as different guiding play methods (such as random, forced, n-gram, and neural networks) to evaluate their performance. Experimental results demonstrate that when combined with the n-gram play method, NMCS is capable of generating innovative drug-like molecules with high synthesis potential while maintaining generation speed. The paper also verifies the effectiveness, novelty, and synthesis feasibility of the generated molecules using tools like RDKit and AiZynthFinder. The results show that DrugSynthMC can efficiently generate a large number of effective, unique, and potentially synthesizable drug-like molecules, providing a new approach to expanding the chemical diversity of virtual libraries and future drug discovery.