Evaluation of calibration data for partial least squares modeling by using Monte Carlo cross validation

Jiajun Wang,Zhengfeng Li,Luoping Wang,Xihui Bian,Wensheng Cai,Xueguang Shao
DOI: https://doi.org/10.11719/com.app.chem20151228
2015-01-01
Abstract:A method based on Monte Carlo cross validation (MCCV) is proposed for evaluation of calibration data for partial least squares (PLS) regression. In the method, the root mean squared error of cross validation (RMSECV) is calculated as usual using the prediction errors in the MCCV, and another RMSECV is calculated using the prediction errors of the samples that are selected for building the models. The latter is denoted as RMSECVc. If there is no interfering factor in the calibration data, e.g., outlier, noise, or nonlinear responses, the variation of RMSECV and RMSECVc with the latent variable (LV) number will be in a same trend. Otherwise, there will be a difference between the two values after an LV number when the interfering factors are encoded in the model. Therefore, a comparison of the RMSECV and RMSECVc curves can be used for detecting the interfering factors contained in the calibration data. A simulated dataset and 12 real near infrared spectroscopic datasets were used to test the proposed method. The effect of outliers in four real datasets was analyzed. The results show that the method provides a useful tool for evaluation of the calibration dataset and the quality of PLS models.
What problem does this paper attempt to address?