Non-linear machine learning coupled near infrared spectroscopy enhanced model performance and insights for coffee origin traceability

Joy Sim,Cushla McGoverin,Indrawati Oey,Russell Frew,Biniam Kebede
DOI: https://doi.org/10.1177/09670335241269014
2024-08-30
Journal of Near Infrared Spectroscopy
Abstract:Journal of Near Infrared Spectroscopy, Ahead of Print. Over the past decade, there has been overwhelming interest in rapid and routine origin tracing and authentication methods, such as near infrared (NIR) spectroscopy. In a systematic and comprehensive approach, this study coupled NIR with advanced machine learning models to explore the origin classification of coffee at various scales (continental to regional level). Speciality green coffee beans were sourced from three continents, eight countries, and 22 regions. The dispersive bulk NIR spectra were used for spectral registration in the reflectance mode, and the obtained spectra were preprocessed with extended multiplicative scatter correction and mean centering. The classical linear partial least squares-discriminant analysis (PLS-DA) adequately predicted origin at the continental and country level, and showed promise at the regional level. Non-linear machine learning models improved predictions further, with the best accuracy found using random forest with accuracies up to 0.99. Discriminating wavelength regions and constituents were identified at each origin scale, with more minor wavelength regions selected by random forest. This proof of concept work demonstrated the potential of NIR spectroscopy coupled with machine learning for rapid origin classification of coffee from the continental to the regional level.
spectroscopy,chemistry, applied
What problem does this paper attempt to address?