A Ternary Classification Using Machine Learning Methods of Distinct Estrogen Receptor Activities Within A Large Collection of Environmental Chemicals

Quan Zhang,Lu Yan,Yan Wu,Li Ji,Yuanchen Chen,Meirong Zhao,Xiaowu Dong
DOI: https://doi.org/10.1016/j.scitotenv.2016.12.088
2017-01-01
Abstract:Endocrine-disrupting chemicals (EDCs), which can threaten ecological safety and be harmful to human beings, have been cause for wide concern. There is a high demand for efficient methodologies for evaluating potential EDCs in the environment. Herein an evaluation platform was developed using novel and statistically robust ternary models via different machine learning models (i.e., linear discriminant analysis, classification and regression tree, and support vector machines). The platform is aimed at effectively classifying chemicals with agonistic, antagonistic, or no estrogen receptor (ER) activities. A total of 440 chemicals from the literature were selected to derive and optimize the three-class model. One hundred and nine new chemicals appeared on the 2014 EPA list for EDC screening, which were used to assess the predictive performances by comparing the E-screen results with the predicted results of the classification models. The best model was obtained using support vector machines (SVM) which recognized agonists and antagonists with accuracies of 76.6% and 75.0%, respectively, on the test set (with an overall predictive accuracy of 75.2%), and achieved a 10-fold cross-validation (CV) of 73.4%. The external predicted accuracy validated by the E-screen assay was 87.5%, which demonstrated the application value for a virtual alert for EDCs with ER agonistic or antagonistic activities. It was demonstrated that the ternary computational model could be used as a faster and less expensive method to identify EDCs that act through nuclear receptors, and to classify these chemicals into different mechanism groups.
What problem does this paper attempt to address?