Environmental toxicity risk evaluation of nitroaromatic compounds: Machine learning driven binary/multiple classification and design of safe alternatives

Yuxing Hao,Tengjiao Fan,Guohui Sun,Feifan Li,Na Zhang,Lijiao Zhao,Rugang Zhong,Yuxing Hao,Tengjiao Fan,Guohui Sun,Feifan Li,Na Zhang,Lijiao Zhao,Rugang Zhong
DOI: https://doi.org/10.1016/j.fct.2022.113461
IF: 4.3
2022-12-01
Food and Chemical Toxicology
Abstract:Nitroaromatic compounds (NACs) represent a significant source of organic pollutants in the environment. In this study, a well-rounded dataset containing 371 NACs with rat oral median lethal doses (LD50s) was developed. Based on the dataset, binary and multiple classification models were established. Seven machine learning algorithms were used to establish the prediction models in combination with six fingerprints. In the binary classification models, the overall predictive accuracy of 10-fold cross-validation for training set in the top ten models ranged from 0.823 to 0.874. In the multiple classification models, the combination of graph fingerprint and random forest (Graph-RF) yielded the best predictive effects with AUC values of 0.929 and 0.956 for the training set and the test set, respectively. Model prediction performance was further evaluated using the true external set comprising 1366 NACs, including 96.6% belonging to the applicability domain. Further, we determined the structural features influencing the acute oral toxicity based on information gain and substructure frequency analysis. Finally, we identified highly toxic compounds based on the structural alerts and successfully transformed a representative highly toxic compound into low-toxic alternatives via structural modification. Overall, the models constructed facilitate environmental risk assessment and the design of green and safe chemicals.
toxicology,food science & technology
What problem does this paper attempt to address?