Machine learning-based prediction of total phenolic and flavonoid in horticultural products

Kusumiyati Kusumiyati,Yonathan Asikin
DOI: https://doi.org/10.1515/opag-2022-0163
2023-01-01
Open Agriculture
Abstract:Abstract The purpose of this study was to predict the total phenolic content (TPC) and total flavonoid content (TFC) in several horticultural commodities using near-infrared spectroscopy (NIRS) combined with machine learning. Although models are typically developed for a single product, expanding the coverage of the model can improve efficiency. In this study, 700 samples were used, including varieties of shallot, cayenne pepper, and red chili. The results showed that the TPC model developed yielded R 2 cal, root mean squares error in the calibration set, R 2 pred, root mean squares error in prediction set, and ratio of performance to deviation values of 0.79, 123.33, 0.78, 124.20, and 2.13, respectively. Meanwhile, the TFC model produced values of 0.71, 44.52, 0.72, 42.10, and 1.87, respectively. The wavelengths 912, 939, and 942 nm are closely related to phenolic compounds and flavonoids. The accuracy of the model in this study produced satisfactory results. Therefore, the application of NIRS and machine learning to horticultural products has a high potential of replacing conventional laboratory analysis TPC and TFC.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to use near - infrared spectroscopy (NIRS) combined with machine - learning techniques to predict the total phenolic content (TPC) and total flavonoid content (TFC) in several horticultural products. Traditionally, the determination of these quality attributes requires chemical analysis in the laboratory, which is not only time - consuming and costly, but also generates chemical waste that is harmful to the environment. Therefore, researchers hope to develop a rapid and non - destructive method to replace the traditional laboratory analysis method in order to improve efficiency and reduce environmental pollution. Specifically, the researchers used 700 samples, including different varieties of onions, peppers and red peppers, collected data through NIRS, and combined multiple machine - learning algorithms to establish prediction models. The goal of the study was to verify whether the combination of NIRS and machine - learning can effectively predict the TPC and TFC of these horticultural products, and which spectral pre - processing method can best improve the prediction ability of the model. The research results show that the original spectral data performs best in predicting TPC and TFC. Although multiple spectral pre - processing methods were tried, they did not significantly improve the accuracy of the model. This may be because the data set itself has high variability, and spectral pre - processing usually functions by minimizing the variability of data. In addition, the study also found that the 900 - 950 - nanometer wavelength region is particularly important for establishing prediction models for TPC and TFC.