Abstract:Molecular docking, the task of predicting the binding structures between a protein and a small molecule ligand, plays a significant role in structural-based drug discovery. In recent years, numerous deep learning-based methods for molecular docking have emerged. State-of-the-art approaches such as DiffDock formulate the docking problem using diffusion generative models, exhibiting superior performance than traditional docking algorithms. However, despite the strong performance of these deep learning-based docking methods in predicting binding poses, they often lack a well-defined scoring function. This limitation poses challenges in effectively distinguishing between the strong and weak inhibitors during virtual screening. To address this limitation, we introduce FeatureDock, a transformer-based deep learning framework, which can accurately predict the protein-ligand binding poses as well as achieve a strong scoring power for virtual screening. FeatureDock extracts chemical features from local environments within protein structures and utilizes a Transformer encoder to predict probability density envelopes indicating where ligands are most likely to bind in the protein pocket. We also designed a scoring function, which encodes the predicted probability density envelope, to optimize and score the ligand poses. In addition, the attention mechanism in FeatureDock’s Transformer further enhances the model’s interpretability by providing the attention weights of each chemical feature from the protein structures in predicting the binding probabilities. When applied to virtual screening, we demonstrated that FeatureDock outperforms DiffDock, Smina and AutoDock Vina in distinguishing strong inhibitors from weak ones for both Cyclin-Dependent Kinase 2 (CDK2, an inactivated form) and Angiotensin-converting enzyme (ACE). The performance was assessed using Kullback–Leibler (KL) divergence and area under receiver operating characteristic (AUC) evaluation metrics. We also demonstrate that FeatureDock can accurately predict the binding poses, achieving an average RMSD of 2.4 Å when compared to CDK2-ligand co-crystal structures. We anticipate that our FeatureDock holds promise to be widely applied in virtual screening to assist in drug design. FeatureDock is available at https://github.com/xuhuihuang/featuredock.

What problem does this paper attempt to address?

This paper introduces a new method called FeatureDock for protein-ligand docking, which is a critical task in drug discovery to predict the binding structure between proteins and small molecule ligands. Existing deep learning methods have shown excellent performance in predicting binding conformations but lack effective scoring functions to differentiate strong and weak inhibitors. FeatureDock utilizes a Transformer architecture to learn the local environment of physicochemical feature-based representations and predict the most likely binding positions of ligands in protein pockets. It also designs a scoring function to optimize and evaluate ligand conformations. In virtual screening experiments, FeatureDock outperforms DiffDock, Smina, and AutoDock Vina in discriminating between strong and weak inhibitors of CDK2 (inactive form) and angiotensin-converting enzyme (ACE). Moreover, FeatureDock accurately predicts binding conformations with an average RMSD of 2.4 Å compared to the co-crystal structure of CDK2-ligand. Researchers believe that FeatureDock has the potential for wide application in virtual screening and assisting drug design.

FeatureDock: Protein-Ligand Docking Guided by Physicochemical Feature-Based Local Environment Learning using Transformer

Dockformer: A transformer-based molecular docking paradigm for large-scale virtual screening

DeepDock: Enhancing Ligand-protein Interaction Prediction by a Combination of Ligand and Structure Information

ETDock: A Novel Equivariant Transformer for Protein-Ligand Docking

Docking Score ML: Target-Specific Machine Learning Models Improving Docking-Based Virtual Screening in 155 Targets

Boosting Docking-Based Virtual Screening with Deep Learning

DeltaDock: A Unified Framework for Accurate, Efficient, and Physically Reliable Molecular Docking

DOCKSTRING: Easy Molecular Docking Yields Better Benchmarks for Ligand Design

Boosting Deep Learning-based Docking with Cross-attention and Centrality Embedding

DockFormer: Efficient Multi-Modal Receptor-Ligand Interaction Prediction using Pair Transformer

FitDock: protein–ligand docking by template fitting

Multi-scale Iterative Refinement towards Robust and Versatile Molecular Docking

DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking

Pre-Training on Large-Scale Generated Docking Conformations with HelixDock to Unlock the Potential of Protein-ligand Structure Prediction Models

ApoDock: Ligand-Conditioned Sidechain Packing for Flexible Molecular Docking

Efficient Exploration of Chemical Space with Docking and Deep Learning

CarsiDock: a deep learning paradigm for accurate protein-ligand docking and screening based on large-scale pre-training

Combining Docking Pose Rank and Structure with Deep Learning Improves Protein–Ligand Binding Mode Prediction over a Baseline Docking Approach

Harnessing Deep Learning for Enhanced Ligand Docking.

MILCDock: Machine Learning Enhanced Consensus Docking for Virtual Screening in Drug Discovery

Deep Surrogate Docking: Accelerating Automated Drug Discovery with Graph Neural Networks