In Silico Prediction of Androgenic and Nonandrogenic Compounds Using Random Forest

Yan Li,Yonghua Wang,Jun Ding,Yuan Wang,Yaqing Chang,Shuwei Zhang
DOI: https://doi.org/10.1002/qsar.200810100
2009-01-01
QSAR & Combinatorial Science
Abstract:The purpose of the present study was to develop in silico models allowing for a reliable prediction of androgenic and nonandrogenic compounds based on a large diverse dataset of 205 compounds. As a new classification method, the Random Forest (RF) was applied, its performance to classify these compounds in terms of their Quantitative Structure-Activity Relationships (QSAR) was evaluated and also compared with the widely used Partial Least Squares (PLS) analysis for the dataset. The predictive power of these methods was verified with five-fold cross-validation and an independent test set. For the RF model, the prediction accuracies of the androgenic and nonandrogenic compounds are 81.0 and 77.0% for cross-validation, respectively, averaging 87.3% of correctly classified compounds in the external tests. The PLS is slightly weak, showing an average prediction accuracy of 75 and 74.7% for the cross-validation and external validation, respectively. Our analysis demonstrates that RF is a powerful tool capable of building models for the data and should be valuable for virtual screening of androgen receptor-binding ligands.
What problem does this paper attempt to address?