Unbiased prediction errors for partial least squares regression models: Choosing a representative error estimator for process monitoring

Peter B Skou,Margherita Tonolini,Carl Emil Eskildsen,Frans van den Berg,Morten Arendt Rasmussen
DOI: https://doi.org/10.1177/09670335231173139
2023-05-23
Journal of Near Infrared Spectroscopy
Abstract:Journal of Near Infrared Spectroscopy, Ahead of Print. Partial least squares (PLS) regression is widely used to predict chemical analytes from spectroscopic data, thus reducing the need for expensive and time-consuming wet chemical reference analysis in industrial process monitoring. However, predictions via PLS by definition carry sample-specific errors, and estimation of these errors is essential for correct interpretation of results. To increase trust in PLS regression-based predictions, reliable prediction error estimates must be reported. This can be achieved by determining realistic sample-specific prediction errors using an unbiased mean squared prediction error estimate. This work provides a guide for estimating sample-specific prediction errors, showing the importance of choosing an appropriate error estimator prior to deploying PLS models for industrial applications. We reviewed recent and established methods for estimating the sample-specific prediction error and test them through simulation studies. The methods were subsequently applied for estimating prediction errors in two real-life datasets from the food ingredients industry, where near-infrared spectroscopy was used to quantify i) urea in process water and ii) individual protein concentrations in ultrafiltration retentates from a protein fractionation process. Both the simulations and real data examples showed that the mean squared error of calibration is always a downward biased estimator. Although leave-one-out-cross-validation performed surprisingly well in the data analysed in this work, this paper demonstrated that the appropriate choice of error estimator requires the user to make an informed, data-centered decision.
spectroscopy,chemistry, applied
What problem does this paper attempt to address?