Abstract:In recent years, protein-ligand interaction scoring functions derived through machine-learning are repeatedly reported to outperform conventional scoring functions. However, several published studies have questioned that the superior performance of machine-learning scoring functions is dependent on the overlap between the training set and the test set. In order to examine the true power of machine-learning algorithms in scoring function formulation, we have conducted a systematic study of six off-the-shelf machine-learning algorithms, including Bayesian Ridge Regression (BRR), Decision Tree (DT), K-Nearest Neighbors (KNN), Multilayer Perceptron (MLP), Linear Support Vector Regression (L-SVR), and Random Forest (RF). Model scoring functions were derived with these machine-learning algorithms on various training sets selected from over 3700 protein-ligand complexes in the PDBbind refined set (version 2016). All resulting scoring functions were then applied to the CASF-2016 test set to validate their scoring power. In our first series of trial, the size of the training set was fixed; while the overall similarity between the training set and the test set was varied systematically. In our second series of trial, the overall similarity between the training set and the test set was fixed, while the size of the training set was varied. Our results indicate that the performance of those machine-learning models are more or less dependent on the contents or the size of the training set, where the RF model demonstrates the best learning capability. In contrast, the performance of three conventional scoring functions (i.e., ChemScore, ASP, and X-Score) is basically insensitive to the use of different training sets. Therefore, one has to consider not only "hard overlap" but also "soft overlap" between the training set and the test set in order to evaluate machine-learning scoring functions. In this spirit, we have complied data sets based on the PDBbind refined set by removing redundant samples under several similarity thresholds. Scoring functions developers are encouraged to employ them as standard training sets if they want to evaluate their new models on the CASF-2016 benchmark.

Assessment of the Generalization Abilities of Machine-Learning Scoring Functions for Structure-Based Virtual Screening

A Case-Based Meta-Learning Algorithm Boosts the Performance of Structure-Based Virtual Screening.

Beware of the Generic Machine Learning-Based Scoring Functions in Structure-Based Virtual Screening.

From P100 to P100': A new citation‐rank approach

Accuracy or novelty: what can we gain from target-specific machine-learning-based scoring functions in virtual screening?

Recent progress on the prospective application of machine learning to structure-based virtual screening

Can Machine Learning Consistently Improve the Scoring Power of Classical Scoring Functions? Insights into the Role of Machine Learning in Scoring Functions.

A Generalized Protein-Ligand Scoring Framework with Balanced Scoring, Docking, Ranking and Screening Powers.

SVSBI: Sequence-based virtual screening of biomolecular interactions

Data-augmented machine learning scoring functions for virtual screening of YTHDF1 m6A reader protein

Improving Structure-Based Virtual Screening with Ensemble Docking and Machine Learning

A note from the executive editor

Computational representations of protein–ligand interfaces for structure-based virtual screening

Improving Structure-Based Virtual Screening Performance Via Learning from Scoring Function Components

Machine‐learning scoring functions for structure‐based drug lead optimization

A Support Vector Machines Approach for Virtual Screening of Active Compounds of Single and Multiple Mechanisms from Large Libraries at an Improved Hit-Rate and Enrichment Factor.

Delta Machine Learning to Improve Scoring-Ranking-Screening Performances of Protein–Ligand Scoring Functions

Comparative analysis of machine learning methods in ligand-based virtual screening of large compound libraries.

Featurization strategies for protein–ligand interactions and their applications in scoring function development

Classification of Src Kinase Inhibitors Based on Support Vector Machine

Tapping on the Black Box: How is the Scoring Power of a Machine-Learning Scoring Function Dependent on the Training Set?