In silico evaluation of logD7.4 and comparison with other prediction methods

jianbing wang,dongsheng cao,minfeng zhu,yonghuan yun,nan xiao,yizeng liang
DOI: https://doi.org/10.1002/cem.2718
IF: 2.5
2015-01-01
Journal of Chemometrics
Abstract:Lipophilicity, evaluated by either n-octanol/water partition coefficient or n-octanol/buffer solution distribution coefficient, is of high importance in pharmacology, toxicology, and medicinal chemistry. A quantitative structure-property relationship study was carried out to predict distribution coefficients at pH7.4 (logD(7.4)) of a large data set consisting of 1130 organic compounds. Partial least squares and support vector machine (SVM) regressions were employed to build prediction models with 30 molecular descriptors selected by genetic algorithm. The obtained results demonstrated that the SVM model is more reliable and has a better prediction performance than the partial least squares model. The square correlation coefficients of fitting, cross validation, and prediction are 0.92, 0.90, and 0.89, respectively. The corresponding root mean square errors are 0.52, 0.59, and 0.56, respectively. The robustness, reliability, and generalization ability of the model were assessed by Y-randomization test and applicability domain. When compared with logD(7.4) values calculated by five existing methods from Discovery Studio and ChemAxon, our SVM model shows superiority over them. The results indicated that our model could give a reliable and robust prediction of logD(7.4). Copyright (c) 2015 John Wiley & Sons, Ltd.
What problem does this paper attempt to address?