Accuracy or novelty: what can we gain from target-specific machine-learning-based scoring functions in virtual screening?

Chao Shen,Gaoqi Weng,Xujun Zhang,Elaine Lai-Han Leung,Xiaojun Yao,Jinping Pang,Xin Chai,Dan Li,Ercheng Wang,Dongsheng Cao,Tingjun Hou
DOI: https://doi.org/10.1093/bib/bbaa410
IF: 9.5
2021-01-08
Briefings in Bioinformatics
Abstract:Abstract Machine-learning (ML)-based scoring functions (MLSFs) have gradually emerged as a promising alternative for protein–ligand binding affinity prediction and structure-based virtual screening. However, clouds of doubts have still been raised against the benefits of this novel type of scoring functions (SFs). In this study, to benchmark the performance of target-specific MLSFs on a relatively unbiased dataset, the MLSFs trained from three representative protein–ligand interaction representations were assessed on the LIT-PCBA dataset, and the classical Glide SP SF and three types of ligand-based quantitative structure-activity relationship (QSAR) models were also utilized for comparison. Two major aspects in virtual screening campaigns, including prediction accuracy and hit novelty, were systematically explored. The calculation results illustrate that the tested target-specific MLSFs yielded generally superior performance over the classical Glide SP SF, but they could hardly outperform the 2D fingerprint-based QSAR models. Although substantial improvements could be achieved by integrating multiple types of protein–ligand interaction features, the MLSFs were still not sufficient to exceed MACCS-based QSAR models. In terms of the correlations between the hit ranks or the structures of the top-ranked hits, the MLSFs developed by different featurization strategies would have the ability to identify quite different hits. Nevertheless, it seems that target-specific MLSFs do not have the intrinsic attributes of a traditional SF and may not be a substitute for classical SFs. In contrast, MLSFs can be regarded as a new derivative of ligand-based QSAR models. It is expected that our study may provide valuable guidance for the assessment and further development of target-specific MLSFs.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
The problem this paper attempts to address is the evaluation of the performance of target-specific Machine Learning Scoring Functions (MLSFs) in virtual screening, particularly in terms of predictive accuracy and hit novelty, compared to traditional ligand-based Quantitative Structure-Activity Relationship (QSAR) models. Specifically, the researchers focus on: 1. **Predictive Accuracy**: Assessing whether target-specific MLSFs perform better in predicting protein-ligand binding affinity compared to classical scoring functions (such as Glide SP) and ligand-based QSAR models. 2. **Hit Novelty**: Investigating whether target-specific MLSFs can identify active compounds with novel structures, rather than just those similar to known active compounds. Through these evaluations, the researchers aim to provide valuable guidance for the application of target-specific MLSFs in virtual screening and to further advance research and development in this field.