A Novel Combinatorial Forecast Method Based on Support Vector Machine Regression and K-Near Neighbor Group and Its Application in QSAR

YUAN Zhe-ming,XIONG Jie-yi,ZHANG Yong-sheng
2007-01-01
Abstract:To improve the predication precision in quantitative structure-activity relationship(QSAR) research,a novel nonlinear combinatorial forecast method based on support vector machine regression and k-near neighbor group was proposed.Firstly,screen the descriptors using support vector machine regression(SVR) by leave-one-out method based on the minimum mean square error(MSE),get the optimal kernel and the corresponding retained descriptors.Secondly,characterize the heterogeneity of the sample set using the predication values of different k-near neighbor group based on Euclid distances of the retained descriptors vectors between test samples among train samples.Then,screen the sub-models,the predication values of different k-near neighbor group,using SVR by leave-one-out method based on the minimum MSE,get the optimal kernel and the corresponding retained sub-models.Finaly,carry out combinatorial forecast by dual leave-one-out method based on the retained sub-models.The predicted results of QSAR for substituted anilines and phenols to Daphina magna Straus showed that the novel combination model had the highest prediction precision in all reference models and characterized the nonlinear relationships between the toxicity among the descriptors subtly.It had the advantages of structural risk minimization,non-linear characteristics,avoiding the over-fit,strong generalization ability and high prediction precision,etc.The novel combination model,hence,can be widely used in QSAR.
What problem does this paper attempt to address?