Benchmarking Methods and Data Sets for Ligand Enrichment Assessment in Virtual Screening.

Jie Xia,Ermias Lemma Tilahun,Terry-Elinor Reid,Liangren Zhang,Xiang Simon Wang
DOI: https://doi.org/10.1016/j.ymeth.2014.11.015
IF: 4.647
2014-01-01
Methods
Abstract:Retrospective small-scale virtual screening (VS) based on benchmarking data sets has been widely used to estimate ligand enrichments of VS approaches in the prospective (i.e. real-world) efforts. However, the intrinsic differences of benchmarking sets to the real screening chemical libraries can cause biased assessment. Herein, we summarize the history of benchmarking methods as well as data sets and highlight three main types of biases found in benchmarking sets, i.e. "analogue bias", "artificial enrichment" and "false negative". In addition, we introduce our recent algorithm to build maximum-unbiased benchmarking sets applicable to both ligand-based and structure-based VS approaches, and its implementations to three important human histone deacetylases (HDACs) isoforms, i.e. HDAC1, HDAC6 and HDAC8. The leave-one-out cross-validation (LOO CV) demonstrates that the benchmarking sets built by our algorithm are maximum-unbiased as measured by property matching, ROC curves and AUCs.
What problem does this paper attempt to address?