Ensemble Multiclassification Model for Aquatic Toxicity of Organic Compounds

Xinran Li,Gaohua Liu,Zhibo Wang,Li Zhang,Hongsheng Liu,Haixin Ai
DOI: https://doi.org/10.1016/j.aquatox.2022.106379
IF: 5.202
2023-01-01
Aquatic Toxicology
Abstract:With environmental pollution becoming increasingly serious, organic compounds have become the main hazard of environmental pollution and exert substantial negative impacts on aquatic organisms. In research pertaining to the acute toxicity of organic compounds, traditional biological experimental methods are time-consuming and expensive. In addition, computer-aided binary classification models cannot accurately classify acute toxicity. Therefore, the multiclassication model is necessary for more accurate classification of acute toxicity. In this study, median lethal concentrations of 373 organic compounds in the environmental toxicology datasets ECO-TOX and EAT5 were used. These chemicals were classified into four categories based on the European Economic Community criteria. Then the random forest, support vector machine, extreme gradient boosting, adaptive gradient boosting, and C5.0 decision tree algorithms and eight molecular fingerprints were used to build a multiclassification base model for the acute toxicity of organic compounds. The base models were repeated 100 times with fivefold cross-validation and external validation. The ensemble model was obtained by the voting method. The best base classifier was ExtendFP-C5.0, which had an accuracy, sensitivity and specificity values of 87.30%, 87.32% and 95.76% for external validation, and the voting ensemble model performance of 96.92%, 96.93% and 98.97%, respectively. The ensemble model achieved a higher accuracy than previously reported studies. Our study will help to further classify the acute toxicity of organic compounds to aquatic organisms and predict the hazard classes of organic compounds.
What problem does this paper attempt to address?