TacoGFN: Target-conditioned GFlowNet for Structure-based Drug Design

Tony Shen,Seonghwan Seo,Grayson Lee,Mohit Pandey,Jason R Smith,Artem Cherkasov,Woo Youn Kim,Martin Ester
2024-04-08
Abstract:Searching the vast chemical space for drug-like and synthesizable molecules with high binding affinity to a protein pocket is a challenging task in drug discovery. Recently, molecular deep generative models have been introduced which promise to be more efficient than exhaustive virtual screening, by directly generating molecules based on the protein structure. However, since they learn the distribution of a limited protein-ligand complex dataset, the existing methods struggle with generating novel molecules with significant property improvements. In this paper, we frame the generation task as a Reinforcement Learning task, where the goal is to search the wider chemical space for molecules with desirable properties as opposed to fitting a training data distribution. More specifically, we propose TacoGFN, a Generative Flow Network conditioned on protein pocket structure, using binding affinity, drug-likeliness and synthesizability measures as our reward. Empirically, our method outperforms state-of-art methods on the CrossDocked2020 benchmark for every molecular property (Vina score, QED, SA), while significantly improving the generation time. TacoGFN achieves $-8.82$ in median docking score and $52.63\%$ in Novel Hit Rate.
Machine Learning
What problem does this paper attempt to address?
This paper focuses on the problems in Structure-Based Drug Design (SBDD), which involves searching for molecules in a vast chemical space that have strong binding affinity to specific protein pockets, as well as pharmacological activity and synthetic feasibility. Traditional virtual screening methods are inefficient, while deep learning-based molecular generation models can directly generate molecules based on protein structures. However, due to limited protein-ligand complex datasets, these models struggle to generate molecules with significantly improved properties. The paper proposes a novel method called Target-conditioned Generative Flow Network (TACOGFN), which transforms the molecular generation task into a reinforcement learning task. The objective is to search for molecules with desired attributes in the chemical space, rather than simply fitting the training data distribution. TACOGFN utilizes predicted binding affinity, drug similarity, and synthesis feasibility as rewards to guide molecule generation. It also introduces a model for predicting docking scores, leveraging pre-trained pharmacophore representations to efficiently evaluate molecules. Experiments show that TACOGFN outperforms existing state-of-the-art methods on multiple molecular properties in the CrossDocked2020 benchmark test, while significantly improving generation time. TACOGFN performs well in terms of median docking scores and novelty hit rate, addressing the challenges faced by existing methods in generating novel drug molecules with significantly improved properties.