Iteratively variable subset optimization for multivariate calibration

weiting wang,yonghuan yun,baichuan deng,wei fan,yizeng liang
DOI: https://doi.org/10.1039/c5ra08455e
IF: 4.036
2015-01-01
RSC Advances
Abstract:Based on the theory that a large partial least squares (PLS) regression coefficient on autoscaled data indicates an important variable, a novel strategy for variable selection called iteratively variable subset optimization (IVSO) is proposed in this study. In addition, we take into consideration that the optimal number of latent variables generated by cross-validation will make a great difference to the regression coefficients and sometimes the difference can even vary by several orders of magnitude. In this work, the regression coefficients generated in every sub-model are normalized to remove the influence. In each iterative round, the regression coefficients of each variable obtained from the sub-models are summed to evaluate their importance level. A two-step procedure including weighted binary matrix sampling (WBMS) and sequential addition is employed to eliminate uninformative variables gradually and gently in a competitive way and reduce the risk of losing important variables. Thus, IVSO can achieve high stability. Investigated by using one simulated dataset and two NIR datasets, IVSO shows much better prediction ability than two other outstanding and commonly used methods, Monte Carlo uninformative variable elimination (MC-UVE) and competitive adaptive reweighted sampling (CARS). The MATLAB code for implementing IVSO is available in the ESI.
What problem does this paper attempt to address?