In Silico Toxicity Prediction by Support Vector Machine and SMILES Representation-Based String Kernel.

D. -S. Cao,J. -C. Zhao,Y. -N. Yang,C. -X. Zhao,J. Yan,S. Liu,Q. -N. Hu,Q. -S. Xu,Y. -Z. Liang
DOI: https://doi.org/10.1080/1062936x.2011.645874
IF: 3.681
2012-01-01
SAR and QSAR in Environmental Research
Abstract:There is a great need to assess the harmful effects or toxicities of chemicals to which man is exposed. In the present paper, the simplified molecular input line entry specification (SMILES) representation-based string kernel, together with the state-of-the-art support vector machine (SVM) algorithm, were used to classify the toxicity of chemicals from the US Environmental Protection Agency Distributed Structure-Searchable Toxicity (DSSTox) database network. In this method, the molecular structure can be directly encoded by a series of SMILES substrings that represent the presence of some chemical elements and different kinds of chemical bonds (double, triple and stereochemistry) in the molecules. Thus, SMILES string kernel can accurately and directly measure the similarities of molecules by a series of local information hidden in the molecules. Two model validation approaches, five-fold cross-validation and independent validation set, were used for assessing the predictive capability of our developed models. The results obtained indicate that SVM based on the SMILES string kernel can be regarded as a very promising and alternative modelling approach for potential toxicity prediction of chemicals.
What problem does this paper attempt to address?