A novel hybrid variable selection strategy with application to molecular spectroscopic analysis
Jiaji Zhu,Xin Jiang,Qianjin Wang,Jizhong Wu,Shengde Wu,Xiaojing Chen,Quansheng Chen
DOI: https://doi.org/10.1016/j.chemolab.2023.104795
IF: 4.175
2023-05-15
Chemometrics and Intelligent Laboratory Systems
Abstract:In chemometrics, partial least squares (PLS) regression has become an established tool for spectroscopic analysis. Even so, to improve multivariate calibration models’ performance, variable selection is necessary for molecular spectroscopic analysis in many scenarios. However, applying only one variable selection method has many limitations. For example, individual wavelength selection methods may suffer from large computational requirements. In addition, wavelength intervals selection methods may suffer from the retention of uninformative/interfering wavelengths and the effect of collinearity in the selected wavelength intervals. Accordingly, a novel hybrid variable selection strategy, called memetic algorithm-interval partial least squares coupled Hilbert-Schmidt independence criterion based variable space iterative optimization (MA-iPLS + HSIC–VSIO), is proposed in this study. In the first step, a wavelength intervals selection method, MA-iPLS, is used to select optimal wavelength intervals. In the second step, a novel individual wavelength selection method, HSIC–VSIO, is employed for further optimizing the selected wavelength intervals. This hybrid variable selection strategy makes full use of MA-iPLS and HSIC–VSIO. To investigate the performance of MA-iPLS + HSIC–VSIO, it was tested on two groups of spectroscopic datasets: the surface-enhanced Raman scattering (SERS) spectra of chlorpyrifos standard solutions dataset and the near infrared (NIR) spectra of diesel fuels dataset. Nine methods, including PLS, HSIC–VISO, MA-iPLS, genetic algorithm-interval partial least squares (GA-iPLS), particle swarm optimization-interval partial least squares (PSO-iPLS), interval random frog (iRF), GA-iPLS + HSIC–VSIO, PSO-iPLS + HSIC–VSIO, and iRF + HSIC–VSIO, were also applied on the spectroscopic dataset for comparison. The results demonstrated the excellent performance of MA-iPLS + HSIC–VSIO for molecular spectroscopic analysis.
automation & control systems,computer science, artificial intelligence,instruments & instrumentation,statistics & probability,mathematics, interdisciplinary applications,chemistry, analytical