TacoGFN: Target-conditioned GFlowNet for Structure-based Drug Design

Tony Shen,Seonghwan Seo,Grayson Lee,Mohit Pandey,Jason R Smith,Artem Cherkasov,Woo Youn Kim,Martin Ester

2024-04-08

Abstract:Searching the vast chemical space for drug-like and synthesizable molecules with high binding affinity to a protein pocket is a challenging task in drug discovery. Recently, molecular deep generative models have been introduced which promise to be more efficient than exhaustive virtual screening, by directly generating molecules based on the protein structure. However, since they learn the distribution of a limited protein-ligand complex dataset, the existing methods struggle with generating novel molecules with significant property improvements. In this paper, we frame the generation task as a Reinforcement Learning task, where the goal is to search the wider chemical space for molecules with desirable properties as opposed to fitting a training data distribution. More specifically, we propose TacoGFN, a Generative Flow Network conditioned on protein pocket structure, using binding affinity, drug-likeliness and synthesizability measures as our reward. Empirically, our method outperforms state-of-art methods on the CrossDocked2020 benchmark for every molecular property (Vina score, QED, SA), while significantly improving the generation time. TacoGFN achieves $-8.82$ in median docking score and $52.63\%$ in Novel Hit Rate.

Machine Learning

What problem does this paper attempt to address?

This paper focuses on the problems in Structure-Based Drug Design (SBDD), which involves searching for molecules in a vast chemical space that have strong binding affinity to specific protein pockets, as well as pharmacological activity and synthetic feasibility. Traditional virtual screening methods are inefficient, while deep learning-based molecular generation models can directly generate molecules based on protein structures. However, due to limited protein-ligand complex datasets, these models struggle to generate molecules with significantly improved properties. The paper proposes a novel method called Target-conditioned Generative Flow Network (TACOGFN), which transforms the molecular generation task into a reinforcement learning task. The objective is to search for molecules with desired attributes in the chemical space, rather than simply fitting the training data distribution. TACOGFN utilizes predicted binding affinity, drug similarity, and synthesis feasibility as rewards to guide molecule generation. It also introduces a model for predicting docking scores, leveraging pre-trained pharmacophore representations to efficiently evaluate molecules. Experiments show that TACOGFN outperforms existing state-of-the-art methods on multiple molecular properties in the CrossDocked2020 benchmark test, while significantly improving generation time. TACOGFN performs well in terms of median docking scores and novelty hit rate, addressing the challenges faced by existing methods in generating novel drug molecules with significantly improved properties.

TacoGFN: Target-conditioned GFlowNet for Structure-based Drug Design

Geometric-informed GFlowNets for Structure-Based Drug Design

Generative Flows on Synthetic Pathway for Drug Design

GFlowNet Pretraining with Inexpensive Rewards

RGFN: Synthesizable Molecular Generation Using GFlowNets

DGFN: Double Generative Flow Networks

Genetic-guided GFlowNets for Sample Efficient Molecular Optimization

Change in appearance of the optic disc associated with lowering of intraocular pressure.

Cell Morphology-Guided Small Molecule Generation with GFlowNets

PocketFlow is a data-and-knowledge-driven structure-based molecular generative model

Rectified Flow For Structure Based Drug Design

Innovative Drug-like Molecule Generation from Flow-based Generative Model

Generative network complex (GNC) for drug discovery

SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints

Fragment-Based Ligand Generation Guided By Geometric Deep Learning On Protein-Ligand Structure

DrugGen: Advancing Drug Discovery with Large Language Models and Reinforcement Learning Feedback

Dynamic Backtracking in GFlowNets: Enhancing Decision Steps with Reward-Dependent Adjustment Mechanisms

Learning Subpocket Prototypes for Generalizable Structure-based Drug Design

Molecule Generation For Target Protein Binding with Structural Motifs

Improving drug discovery with a hybrid deep generative model using reinforcement learning trained on a Bayesian docking approximation

Learning to design drug-like molecules in three-dimensional space using deep generative models