A New Strategy of Least Absolute Shrinkage and Selection Operator Coupled with Sampling Error Profile Analysis for Wavelength Selection
Ruoqiu Zhang,Feiyu Zhang,Wanchao Chen,Heming Yao,Jiong Ge,Shengchao Wu,Ting Wu,Yiping Du
DOI: https://doi.org/10.1016/j.chemolab.2018.02.007
IF: 4.175
2018-01-01
Chemometrics and Intelligent Laboratory Systems
Abstract:A new strategy based on sampling error profile analysis (SEPA) combined with least absolute shrinkage and selection operator (SEPA-LASSO) was proposed. LASSO has been proven to be effective for multivariate calibration with automatic variable selection for high-dimensional data.However, in the previous research, the critical process of multivariate calibration by LASSO was an optimization of 1-norm turning parameter for a fixed sample set without considering the behaviors of variable selection by different subsets of samples. In the present work, Monte Carlo Sampling (MCS), the core of SEPA framework, is used to investigate various sub-models. Least angle regression (LAR) is used to solve LASSO, and various LAR iteration including certain number of variables could be obtained instead of choosing the numerical values of 1-norm turning parameters. SEPA-LASSO algorithm consists of plenty of loops. Under the SEPA framework and LAR algorithm, a number of LASSO sub-models with the same dimensions are built by MCS in each loop, the vote rule is used to determine the importance of variables and select them to build variable subsets. After running the loops, several subsets of variables are obtained and their error profile is used to choose the optimal subset of variables. The performance of SEPA-LASSO was evaluated by three near-infrared (NIR) datasets. The results show that the model built by SEPA-LASSO has excellent predictability and interpretability, compared with some commonly used multivariate calibration methods, such as principal component regression (PCR) and partial least squares (PLS), as well as some wavelength selection methods including LASSO, moving window partial least squares regression (MWPLSR), Monte Carlo uninformative variable elimination (MC-UVE), ordered homogeneity pursuit lasso (OHPL) and stability competitive adaptive reweighted sampling (SCARS).