Learning Subpocket Prototypes for Generalizable Structure-based Drug Design

Zaixi Zhang,Qi Liu
2023-05-22
Abstract:Generating molecules with high binding affinities to target proteins (a.k.a. structure-based drug design) is a fundamental and challenging task in drug discovery. Recently, deep generative models have achieved remarkable success in generating 3D molecules conditioned on the protein pocket. However, most existing methods consider molecular generation for protein pockets independently while neglecting the underlying connections such as subpocket-level similarities. Subpockets are the local protein environments of ligand fragments and pockets with similar subpockets may bind the same molecular fragment (motif) even though their overall structures are different. Therefore, the trained models can hardly generalize to unseen protein pockets in real-world applications. In this paper, we propose a novel method DrugGPS for generalizable structure-based drug design. With the biochemical priors, we propose to learn subpocket prototypes and construct a global interaction graph to model the interactions between subpocket prototypes and molecular motifs. Moreover, a hierarchical graph transformer encoder and motif-based 3D molecule generation scheme are used to improve the model's performance. The experimental results show that our model consistently outperforms baselines in generating realistic drug candidates with high affinities in challenging out-of-distribution settings.
Biomolecules,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to generate molecules with high binding affinity when designing drug molecules based on structures. Specifically, existing structure - based drug design methods often overlook the local similarities (i.e., sub - pocket - level similarities) between different protein pockets when generating 3D molecules for specific protein pockets. This oversight leads to poor generalization ability of the model when dealing with unseen protein pockets. Therefore, this paper proposes a new method - DrugGPS, aiming to improve the generalization ability of the model for unseen protein pockets by learning sub - pocket prototypes. ### Main Problems and Challenges 1. **Insufficient Generalization Ability**: Existing methods perform poorly when dealing with unseen protein pockets because they usually consider each protein pocket independently and ignore the sub - pocket - level similarities. 2. **Data Limitations**: High - quality protein - ligand complex data is limited, and the target protein pocket may not be in the training dataset. 3. **Atomic - level Generation Problems**: Existing methods mainly focus on atomic - level interactions and generation, which may lead to the generated molecules being unrealistic or invalid. ### Solutions To address the above challenges, the authors propose the following solutions: 1. **Sub - pocket Prototype Learning**: Capture the local similarities between different protein pockets by learning sub - pocket prototypes. Sub - pockets are defined as the local environment of ligand fragments in protein - ligand complexes. 2. **Global Interaction Graph Construction**: Construct a global interaction graph to model the interactions between sub - pocket prototypes and molecular fragments (motifs). 3. **Hierarchical Graph Transformer Encoder**: Use a hierarchical graph transformer encoder to capture context information at the atomic and residue levels. 4. **Motif - based Molecular Generation**: In the generation process, adopt a motif - based method to gradually generate molecular fragments and enhance the sub - pocket representation through a global information fusion step. ### Experimental Verification The authors conducted experiments on the CrossDocked dataset, using two data splitting strategies (sequence - based clustering splitting and pocket - based clustering splitting) to test the generalization ability of the model. The experimental results show that DrugGPS is significantly superior to existing baseline methods in generating molecules with high binding affinity and drug - likeness. ### Main Contributions 1. Proposed DrugGPS, a structure - based drug design method that improves generalization ability by learning sub - pocket prototypes. 2. Designed a hierarchical 3D graph transformer that encodes information at both the atomic and residue levels simultaneously. 3. Constructed a sub - pocket prototype - molecular motif interaction graph and utilized global interaction information during the generation process. 4. The experimental results show that DrugGPS can generate more realistic drug candidate molecules with higher binding affinity and drug - likeness in challenging OOD settings. Through these innovations, DrugGPS provides a new framework for structure - based drug design and is expected to generate more effective drug molecules in practical applications.