Abstract:The field of protein-ligand pose prediction has seen significant advances in recent years, with machine learning-based methods now being commonly used in lieu of classical docking methods or even to predict all-atom protein-ligand complex structures. Most contemporary studies focus on the accuracy and physical plausibility of ligand placement to determine pose quality, often neglecting a direct assessment of the interactions observed with the protein. In this work, we demonstrate that ignoring protein-ligand interaction fingerprints can lead to overestimation of model performance, most notably in recent protein-ligand cofolding models which often fail to recapitulate key interactions.

What problem does this paper attempt to address?

The paper primarily aims to address the issue of evaluating the performance of different methods in recovering key protein-ligand interactions in the prediction of protein-ligand complex structures. Specifically: 1. **Problem Background**: In recent years, machine learning-based methods have made significant progress in predicting protein-ligand binding poses. These methods are often used to replace traditional docking methods and can even directly predict the full-atom protein-ligand complex structures. However, existing studies mostly focus on the accuracy of ligand positioning and its physical plausibility, while neglecting the direct evaluation of observed protein-ligand interactions. 2. **Research Objective**: This paper demonstrates through experiments that ignoring protein-ligand interaction fingerprints (PLIFs) can lead to an overestimation of model performance, especially in some recent protein-ligand co-folding models, which often fail to reproduce key interactions. Therefore, the authors propose using PLIFs as a useful metric for evaluating model quality and benchmark various modern pose prediction tools with it. 3. **Method Comparison**: The paper compares several classical docking algorithms (e.g., GOLD), machine learning docking algorithms (e.g., DiffDock-L), and protein-ligand co-folding models (e.g., RoseTTAFold-AllAtom). It finds that classical docking algorithms generally perform better than machine learning methods in recovering key interactions. Classical methods are particularly more effective in recovering important interactions such as hydrogen bonds. 4. **Conclusion**: By introducing the new metric of PLIF recovery rate, the paper emphasizes that in drug discovery applications, in addition to focusing on RMSD and PoseBuster effectiveness, it is also important to pay attention to the recovery of protein-ligand interactions. This provides a direction for future improvements in machine learning models, such as incorporating explicit PLIF or pharmacophore-sensitive loss functions during training.

Assessing interaction recovery of predicted protein-ligand poses

Predicting binding poses and affinities for protein - ligand complexes in the 2015 D3R Grand Challenge using a physical model with a statistical parameter estimation

Does a More Precise Chemical Description of Protein–Ligand Complexes Lead to More Accurate Prediction of Binding Affinity?

Protein-ligand binding affinity prediction: Is 3D binding pose needed?

Accurate prediction of protein–ligand interactions by combining physical energy functions and graph-neural networks

Synergistic Application of Molecular Docking and Machine Learning for Improved Binding Pose

A new paradigm for applying deep learning to protein–ligand interaction prediction

Are predefined decoy sets of ligand poses able to quantify scoring function accuracy?

Assessment of scoring functions for computational models of protein-protein interfaces

Assessment of Protein-Protein Docking Models Using Deep Learning

DOX: A New Computational Protocol for Accurate Prediction of the Protein-Ligand Binding Structures.

Revealing missing protein-ligand interactions using AlphaFold predictions

Improving the accuracy of pose prediction by incorporating symmetry-related molecules

Leveraging non-structural data to predict structures of protein–ligand complexes

A machine learning approach for ranking clusters of docked protein-protein complexes by pairwise cluster comparison

Leveraging nonstructural data to predict structures and affinities of protein-ligand complexes

DockFormer: Efficient Multi-Modal Receptor-Ligand Interaction Prediction using Pair Transformer

The Impact of Cross-Docked Poses on Performance of Machine Learning Classifier for Protein–ligand Binding Pose Prediction

On Evaluating Molecular-Docking Methods for Pose Prediction and Enrichment Factors

Accurate Protein-Ligand Complex Structure Prediction using Geometric Deep Learning

Predicting the Protein-Ligand Affinity from Molecular Dynamics Trajectories