Simultaneously Optimized Support Vector Regression Combined with Genetic Algorithm for QSAR Analysis of KDR/VEGFR‐2 Inhibitors
Min Sun,Junqing Chen,Jin Cai,Meng Cao,Shuangqing Yin,Min Ji
DOI: https://doi.org/10.1111/j.1747-0285.2010.00958.x
2010-01-01
Chemical Biology & Drug Design
Abstract:Considering the fact that majority of support vector regression models have not been fully optimized in the realm of quantitative structure-activity relationship, an idea of simultaneous optimization has been proposed and evaluated on a set of novel kinase insert domain receptor/vascular endothelial growth factor receptor-2 inhibitors including naphthalene and indazole-based compounds in this study. After the powerful feature searching process using genetic algorithm, the final support vector regression model was constructed on an optimal set of six descriptors, based on which simultaneous optimization was carried out. Specifically, the global optimum is grid searched in the joint parametric space defined by cost (C), gamma and epsilon, where performance of support vector regression using each combination of (C, gamma, epsilon) is evaluated and recorded, resulting in bulky information. Based on the data decomposition strategies provided in the main paper, the best performance was achieved for C = 1.2, gamma = 0.15 and epsilon = 0.065. As a comparison, a linear model based on genetic algorithm-multiple linear regression has also been developed and compared. Performances of these models are rigorously validated using both leave-one-out cross-validation and also external validation. The significant higher R-2 (0.908, 0.837) and lower root-mean-square error (0.237, 0.311) for 45 training and 16 test samples compared to that of genetic algorithm-multiple linear regression (0.764, 0.700 and 0.402, 0.421) confirm the superior performance of genetic algorithm-support vector regression. Robustness and predictive ability of this model is further prudently evaluated. The resulting models introduced not only the idea of simultaneous optimization in support vector regression, but also an efficient strategy for estimating the vascular endothelial growth factor receptor-2 inhibitory activity of novel naphthalene and indazole-based compounds. Moreover, some insights into the structural features related to the biological activity of these compounds have also been provided, which might be of great help for further designing novel vascular endothelial growth factor receptor-2/kinase insert domain receptor inhibitors with potent activity.