Ensemble multiclassification model for predicting developmental toxicity in zebrafish

Gaohua Liu,Xinran Li,Yaxu Guo,Li Zhang,Hongsheng Liu,Haixin Ai
DOI: https://doi.org/10.1016/j.aquatox.2024.106936
Abstract:In recent years, with the rapid development of society, organic compounds have been released into aquatic environments in various forms, posing a significant threat to the survival of aquatic organisms. The assessment of developmental toxicity is an important part of environmental safety risk systems, helping to identify the potential impacts of organic compounds on the embryonic development of aquatic organisms and enabling early detection and warning of potential ecological risks. Additionally, binary classification models cannot accurately classify organic compounds. Therefore, it is crucial to construct a multiclassification model for predicting the developmental toxicity of organic compounds. In this study, binary and multiclassification models were developed based on the ToxCast™ Phase I chemical library and literature data. The random forest, support vector machine, extreme gradient boosting, adaptive gradient boosting, and C5.0 decision tree algorithms, as well as 8 types of molecular fingerprint were used to establish a multiclassification base model for predicting developmental toxicity through 5-fold cross-validation and external validation. Ultimately, a multiclassification ensemble model was derived through a voting method. The performance of the binary ensemble model, as measured by the balanced accuracy, was 0.918, while that of the multiclassification model was 0.819. The developmental toxicity voting ensemble model (DT-VEM) achieved accuracies of 0.804, 0.834, and 0.855. Furthermore, by utilizing the XGBoost machine learning algorithm to construct separate models for molecular descriptors and substructure molecular fingerprints, we identified several substructures and physical properties related to developmental toxicity. Our research contributes to a more detailed classification of developmental toxicity, providing a new and valuable tool for predicting the developmental toxicity effects of unknown compounds. This supplement addresses the limitations of previous tools, as it offers an enhanced ability to predict potential developmental toxicity in novel compounds.
What problem does this paper attempt to address?