In Silico Prediction of Chemical Acute Oral Toxicity Using Multi-Classification Methods
Xiao Li,Lei Chen,Feixiong Cheng,Zengrui Wu,Hanping Bian,Congying Xu,Weihua Li,Guixia Liu,Xu Shen,Yun Tang
DOI: https://doi.org/10.1021/ci5000467
IF: 6.162
2014-04-16
Journal of Chemical Information and Modeling
Abstract:Chemical acute oral toxicity is an important end point in drug design and environmental risk assessment. However, it is difficult to determine by experiments, and in silico methods are hence developed as an alternative. In this study, a comprehensive data set containing 12, 204 diverse compounds with median lethal dose (LD₅₀) was compiled. These chemicals were classified into four categories, namely categories I, II, III and IV, based on the criterion of the U.S. Environmental Protection Agency (EPA). Then several multiclassification models were developed using five machine learning methods, including support vector machine (SVM), C4.5 decision tree (C4.5), random forest (RF), κ-nearest neighbor (kNN), and naïve Bayes (NB) algorithms, along with MACCS and FP4 fingerprints. One-against-one (OAO) and binary tree (BT) strategies were employed for SVM multiclassification. Performances were measured by two external validation sets containing 1678 and 375 chemicals, separately. The overall accuracy of the MACCS-SVM(OAO) model was 83.0% and 89.9% for external validation sets I and II, respectively, which showed reliable predictive accuracy for each class. In addition, some representative substructures responsible for acute oral toxicity were identified using information gain and substructure frequency analysis methods, which might be very helpful for further study to avoid the toxicity.
chemistry, multidisciplinary, medicinal,computer science, interdisciplinary applications, information systems