Prediction of Estrogen Receptor Agonists and Characterization of Associated Molecular Descriptors by Statistical Learning Methods.

H. Li,C. Y. Ung,C. W. Yap,Y. Xue,Z. R. Li,Y. Z. Chen
DOI: https://doi.org/10.1016/j.jmgm.2006.01.007
IF: 2.942
2006-01-01
Journal of Molecular Graphics and Modelling
Abstract:Specific estrogen receptor (ER) agonists have been used for hormone replacement therapy, contraception, osteoporosis prevention, and prostate cancer treatment. Some ER agonists and partial-agonists induce cancer and endocrine function disruption. Methods for predicting ER agonists are useful for facilitating drug discovery and chemical safety evaluation. Structure-activity relationships and rule-based decision forest models have been derived for predicting ER binders at impressive accuracies of 87.1-97.6% for ER binders and 80.2-96.0% for ER non-binders. However, these are not designed for identifying ER agonists and they were developed from a subset of known ER binders. This work explored several statistical learning methods (support vector machines, k-nearest neighbor, probabilistic neural network and C4.5 decision tree) for predicting ER agonists from comprehensive set of known ER agonists and other compounds. The corresponding prediction systems were developed and tested by using 243 ER agonists and 463 ER non-agonists, respectively, which are significantly larger in number and structural diversity than those in previous studies. A feature selection method was used for selecting molecular descriptors responsible for distinguishing ER agonists from non-agonists, some of which are consistent with those used in other studies and the findings from X-ray crystallography data. The prediction accuracies of these methods are comparable to those of earlier studies despite the use of significantly more diverse range of compounds. SVM gives the best accuracy of 88.9% for ER agonists and 98.1% for non-agonists. Our study suggests that statistical learning methods such as SVM are potentially useful for facilitating the prediction of ER agonists and for characterizing the molecular descriptors associated with ER agonists.
What problem does this paper attempt to address?