Classification of a diverse set of Tetrahymena pyriformis toxicity chemical compounds from molecular descriptors by statistical learning methods.

Y Xue,H Li,C Y Ung,C W Yap,Y Z Chen
DOI: https://doi.org/10.1021/tx0600550
2006-01-01
Chemical Research in Toxicology
Abstract:Toxicity of various compounds has been measured in many studies by their toxic effects against Tetrahymena pyriformis. Efforts have also been made to use computational quantitative structure-activity relationship (QSAR) and statistical learning methods (SLMs) for predicting Tetrahymena pyriformis toxicity (TPT) at impressive accuracies. Because of the diversity of compounds and toxicity mechanisms, it is desirable to explore additional methods and to examine if these methods are applicable to more diverse sets of compounds. We tested several SLMs (logistic regression, C4.5 decision tree, k-nearest neighbor, probabilistic neural network, support vector machines) for their capability in predicting TPT by using 1129 compounds (841 TPT and 288 non-TPT agents) which are more diverse than those in other studies. A feature selection method was used for improving prediction performance and selecting molecular descriptors responsible for distinguishing TPT and non-TPT agents. The prediction accuracies are 86.9% approximately 94.2% for TPT and 71.2% approximately 87.5% for non-TPT agents based on 5-fold cross-validation studies, which are comparable to some of earlier studies despite the use of more diverse sets of compounds. The selected molecular descriptors are consistent with those used in other studies and experimental findings. These suggest that SLMs are useful for predicting TPT potential of diverse sets of compounds and for characterizing the molecular descriptors associated with TPT.
What problem does this paper attempt to address?