Computer‐aided Prediction of Toxicity with Substructure Pattern and Random Forest

Dong-Sheng Cao,Yan-Ning Yang,Jian-Chao Zhao,Jun Yan,Shao Liu,Qian-Nan Hu,Qing-Song Xu,Yi-Zeng Liang
DOI: https://doi.org/10.1002/cem.1416
IF: 2.5
2012-01-01
Journal of Chemometrics
Abstract:Toxicity of chemicals induced by different factors is an important consideration, especially during the drug research and development process. Thus, there is urgent need to develop computationally effective models that can predict the toxicity or adverse effects of chemicals for a specific class of chemicals. In this study, random forest (RF) was used to classify five toxicity data sets from Distributed Structure‐Searchable Toxicity database network, using substructure fingerprints calculated directly from simple molecular structure. Three model validation approaches, out‐of‐bag validation incorporated in RF, fivefold cross‐validation, and an independent validation set, were used for assessing the prediction capability of our models. The chemical space analysis of data sets was explored by multidimensional scaling plots, and outlying molecules were also detected by the proximity measure in RF. At the same time, the important substructure fingerprints, recognized by the RF technique, gave some insights into the structure features related to toxicity of chemicals. The results obtained showed that these in silico classification models with substructure patterns and RF are applicable for potential toxicity prediction of chemical compounds. Copyright © 2012 John Wiley & Sons, Ltd.
What problem does this paper attempt to address?