DrugDiff - small molecule diffusion model with flexible guidance towards molecular properties

Marie Oestreich,Erinc Merdivan,Michael Lee,Joachim L. Schultze,Marie Piraud,Matthias Becker
DOI: https://doi.org/10.1101/2024.07.17.603873
2024-07-21
Abstract:With the cost/yield-ratio of drug development becoming increasingly unfavourable, recent work has explored machine learning to accelerate early stages of the development process. Given the current success of deep generative models across domains, we here investigated their application to the property-based proposal of new small molecules for drug development. Specifically, we trained a latent diffusion model - DrugDiff - paired with predictor guidance to generate novel compounds with a variety of desired molecular properties. The architecture was designed to be highly flexible and easily adaptable to future scenarios. Our experiments showed successful generation of unique, diverse and novel small molecules with targeted properties. The code is available at https://github.com/MarieOestreich/DrugDiff.
Bioinformatics
What problem does this paper attempt to address?
The paper aims to address the issue of the cost-effectiveness ratio gradually declining in the drug development process by introducing a small molecule diffusion model called DrugDiff to accelerate the early stages of drug development. Specifically, this model utilizes deep generative models (particularly latent space diffusion models) combined with a predictor-guided mechanism to generate new small molecules with specific desired molecular properties. This approach avoids conditional training, making the model more flexible when facing different application scenarios in the future. Experimental results show that DrugDiff can successfully generate unique, diverse small molecule compounds with target properties. Additionally, the model's design is highly modular, making it easy to adapt to different molecular representations and different property predictors, thereby enhancing the model's customizability and applicability while minimizing the need for retraining.