Estimation of soil organic carbon normalized sorption coefficient (Koc) using least squares-support vector machine

Bin Wang,Jingwen Chen,Xuehua Li,Ya-nan Wang,Lin Chen,Min Zhu,Haiying Yu,Ralph Kühne,Gerrit Schüürmann
DOI: https://doi.org/10.1002/qsar.200860065
2009-01-01
QSAR & Combinatorial Science
Abstract:Least squares-support vector machine (LS-SVM) was used to derive a quantitative structure-activity relationship (QSAR) model for predicting the soil sorption coefficient normalized to organic carbon, K-oc, from 24 fragment-specific increments and four further molecular descriptors, employing a training set of 571 organic compounds and three external validation sets. The combinational parameters of LS-SVM were optimized by adaptive random search technique (ARST). ARST could search the optimal combinational parameters of LS-SVM from the solution space in a simple and quick way. The developed LS-SVM model was compared with the model established by multiple linear regression (MLR) analysis using the same data sets. Generally, the LS-SVM model performed slightly better than the MLR model with respect to goodness-of-fit, predictivity, and applicability domain (AD). The ADS of the LS-SVM and MLR models were described on the basis of leverages and standardized residuals. Both the LS-SVM and MLR models had wide ADS within a given reliability (standardized residual < 3 SE units), but the LS-SVM model was superior for compounds with high leverages.
What problem does this paper attempt to address?