In Silico Prediction of Chemicals Binding to Aromatase with Machine Learning Methods.

Hanwen Du,Yingchun Cai,Hongbin Yang,Hongxiao Zhang,Yuhan Xue,Guixia Liu,Yun Tang,Weihua Li
DOI: https://doi.org/10.1021/acs.chemrestox.7b00037
2017-01-01
Chemical Research in Toxicology
Abstract:Environmental chemicals may affect endocrine systems through multiple mechanisms, one of which is via effects on aromatase (also known as CYP19A1), an enzyme critical for maintaining the normal balance of estrogens and androgens in the body. Therefore, rapid and efficient identification of aromatase-related endocrine disrupting chemicals (EDCs) is important for toxicology and environment risk assessment. In this study, on the basis of the Tox21 10K compound library, in silico classification models for predicting aromatase binders/nonbinders were constructed by machine learning methods. To improve the prediction ability of the models, a combined classifier (CC) strategy that combines different independent machine learning methods was adopted. Performances of the models were measured by test and external validation sets containing 1336 and 216 chemicals, respectively. The best model was obtained with the MACCS (Molecular Access System) fingerprint and CC method, which exhibited an accuracy of 0.84 for the test set and 0.91 for the external validation set. Additionally, several representative substructures for characterizing aromatase binders, such as ketone, lactone, and nitrogen-containing derivatives, were identified using information gain and substructure frequency analysis. Our study provided a systematic assessment of chemicals binding to aromatase. The built models can be helpful to rapidly identify potential EDCs targeting aromatase.
What problem does this paper attempt to address?