Incorporating variable importance into kernel PLS for modeling the structure–activity relationship

Xin Huang,Yi-Ping Luo,Qing-Song Xu,Yi-Zeng Liang
DOI: https://doi.org/10.1007/s10910-017-0826-9
2017-01-01
Journal of Mathematical Chemistry
Abstract:Kernel partial least squares (KPLS) has become popular techniques for chemical and biological modeling, which is a nonlinear extension of linear PLS. Training samples are transformed into a feature space via a nonlinear mapping, and then PLS algorithm can be carried out in the feature space. However, one of the main limitations of KPLS is that each feature is given the same importance in the kernel matrix, thus explaining the poor performance of KPLS for data with many irrelevant features. In this study, we provide a new strategy incorporated variable importance into KPLS, which is termed as the WKPLS approach. The WKPLS approach by modifying the kernel matrix provides a feasible way to differentiate between the true and noise variables. On the basis of the fact that the regression coefficients of the PLS model reflect the importance of variables, we firstly obtain the normalized regression coefficients by establishing the PLS model with all the variables. Then, Variable importance is incorporated into primary kernel. The performance of WKPLS is investigated with one simulated dataset and two structure–activity relationship (SAR) datasets. Compared with standard linear kernel PLS and Gaussian kernel PLS, The results show that WKPLS yields superior prediction performances to standard KPLS. WKPLS could be considered as a good mechanism by introducing extra information to improve the performance of KPLS for modeling SAR.
What problem does this paper attempt to address?