Adaptive weighted least square support vector machine regression integrated with outlier detection and its application in QSAR
Wentong Cui,Xuefeng Yan
DOI: https://doi.org/10.1016/j.chemolab.2009.05.008
IF: 4.175
2009-01-01
Chemometrics and Intelligent Laboratory Systems
Abstract:In order to eliminate the influence of unavoidable outliers in training sample on a model's performance, a novel least square support vector machine regression, which combines outlier detection approach and adaptive weight value for the training sample, is proposed and named as adaptive weighted least square support vector machine regression (AWLS-SVM). Firstly, the effective robust 3σ principle is used to detect marked outliers for the training sample. Secondly, based on the training sample without marked outliers, least square support vector machine regression is employed to develop the model and the fitting error of each sample data is obtained. Thirdly, according to the fitting error of each sample data, the initial weight is calculated. The bigger the fitting error of sample data is, the smaller the weight value of the sample data. Thus, the potential outliers, which are not detected by the robust 3σ principle and have bigger fitting errors, have smaller weight values to reduce the influence of the potential outliers on the performance of model. Then, LS-SVM is applied for the weighted sample to develop the model again. Finally, via the proposed weight value iterative method, the weight values of the training sample are converged, and the model with good predicting performance is obtained. To illustrate the performance of AWLS-SVM, simulation experiment is designed to produce the training sample with marked outlier and some non-marked outliers. AWLS-SVM, AWLS-SVM without the robust 3σ principle, LS-SVM with the robust 3σ principle, LS-SVM, and radial basis function network are applied to develop the model based on the designed sample. The results show that the influence of marked and un-marked outliers on the model's performance is eliminated by AWLS-SVM, and that the predicting performance of AWLS-SVM is the best. Furthermore, the AWLS-SVM method was applied to develop the quantitative structure–activity relationships (QSAR) model of HIV-1 protease inhibitors, and the satisfactory result was obtained.