PILOT: Equivariant diffusion for pocket conditioned de novo ligand generation with multi-objective guidance via importance sampling

Julian Cremer,Tuan Le,Frank Noé,Djork-Arné Clevert,Kristof T. Schütt
2024-05-24
Abstract:The generation of ligands that both are tailored to a given protein pocket and exhibit a range of desired chemical properties is a major challenge in structure-based drug design. Here, we propose an in-silico approach for the $\textit{de novo}$ generation of 3D ligand structures using the equivariant diffusion model PILOT, combining pocket conditioning with a large-scale pre-training and property guidance. Its multi-objective trajectory-based importance sampling strategy is designed to direct the model towards molecules that not only exhibit desired characteristics such as increased binding affinity for a given protein pocket but also maintains high synthetic accessibility. This ensures the practicality of sampled molecules, thus maximizing their potential for the drug discovery pipeline. PILOT significantly outperforms existing methods across various metrics on the common benchmark dataset CrossDocked2020. Moreover, we employ PILOT to generate novel ligands for unseen protein pockets from the Kinodata-3D dataset, which encompasses a substantial portion of the human kinome. The generated structures exhibit predicted $IC_{50}$ values indicative of potent biological activity, which highlights the potential of PILOT as a powerful tool for structure-based drug design.
Biomolecules,Artificial Intelligence,Computational Engineering, Finance, and Science,Machine Learning
What problem does this paper attempt to address?
The paper proposes a new method called PILOT to address a major challenge in structure-based drug design, which is to generate ligands (drug molecules) that are both compatible with specific protein pockets and possess the desired chemical properties. By combining pocket constraints with large-scale pretraining and attribute-guided diffusion models, PILOT aims to generate 3D ligand structures with high binding affinity and high synthetic accessibility. Traditional methods mentioned in the paper may generate molecules that are difficult to synthesize or have poor chemical characteristics. The workflow of PILOT consists of three stages: unconditional diffusion pretraining, pocket-conditioned fine-tuning, and attribute-guided inference. In the inference stage, importance sampling strategies are used to guide the model in generating molecules with desired properties, such as high binding affinity and high synthetic accessibility. In this way, PILOT outperforms existing methods on the standard benchmark dataset CrossDocked2020 and is able to generate novel ligands for unseen protein pockets that exhibit strong predicted IC50 values of bioactivity. The research results indicate that pretraining is crucial for improving model performance, especially for understanding and capturing complex 3D molecular structures. Additionally, multi-objective optimization and importance sampling strategies enable PILOT to generate molecules that are more easily synthesized and possess drug-like properties while maintaining high binding efficiency. The paper also showcases the potential of PILOT in generating ligands targeting different kinases using the Kinodata-3D dataset. In conclusion, PILOT provides a powerful tool for structure-based drug discovery, enhancing the efficiency and quality of ligand generation through innovative machine learning techniques, thereby facilitating the drug development process.