An adaptive strategy to improve the partial least squares model via minimum covariance determinant
Xudong Huang,Guangzao Huang,Xiaojing Chen,Zhonghao Xie,Shujat Ali,Xi Chen,Leiming Yuan,Wen Shi
DOI: https://doi.org/10.1016/j.chemolab.2024.105120
IF: 4.175
2024-04-17
Chemometrics and Intelligent Laboratory Systems
Abstract:Partial least squares (PLS) regression is a linear regression technique that performs well with high-dimensional regressors. Similar to many other supervised learning techniques, PLS is susceptible to the problem that the prediction and training data are drawn from different distributions, which deteriorates the PLS performance. To address this problem, an adaptive strategy via the minimum covariance determinant (MCD) estimator is proposed to improve the PLS model, which aims to find an appropriate training set for the adaptive construction of an accurate PLS model to fit the prediction data. In this study, an h -subset of the merged set of prediction and training data with the smallest covariance determinant is found via the MCD estimator, and the prediction and training data with Mahalanobis distances to the h -subset less than or equal to a cutoff that is the square root of a quantile of the chi-squared distribution are assumed to have the same distribution, then a PLS model is built on these training data. The proposed method is applied to three real-world datasets and compared with the results of classic PLS, the most significant improvement is obtained for the m5 prediction data in the corn dataset, where the root mean square error of prediction (RMSEP) is reduced from 0.149 to 0.023. For other datasets, our method can also perform better than PLS. The experimental results show the effectiveness of our method.
automation & control systems,computer science, artificial intelligence,instruments & instrumentation,statistics & probability,mathematics, interdisciplinary applications,chemistry, analytical