Distinguishing raw pu-erh tea production regions through a combination of HS-SPME-GC-MS and machine learning algorithms

Zhichao Xiong,Wanzhen Feng,Dongzhou Xia,Jixin Zhang,Yuming Wei,Tiehan Li,Junlan Huang,Yujie Wang,Jingming Ning
DOI: https://doi.org/10.1016/j.lwt.2023.115140
IF: 6.056
2023-08-01
LWT
Abstract:The authenticity of the geographical origin of agricultural products has received widespread attention. Tea tree varieties, processing processes and origins influence the quality and price of raw pu-erh tea (RPT). This study distinguished RPT from 10 different production areas through headspace solid-phase microextraction-gas chromatography-mass spectrometry (HS-SPME-GC-MS) combined with orthogonal partial least squares–discriminant analysis (OPLS-DA) model and machine learning algorithms. Among the thirty-five types of common volatiles identified, pentanal, heptanal, naphthalene, cedrol, and 2,6-di-tert-butylbenzoquinone were considered the key differential compounds distinguishing the 10 different production areas of the RPT samples through the screening of variable importance in projection values of the OPLS-DA and coefficient weights of the linear discriminant analysis function. Among them, heptanal and 2,6-di-tert-butylbenzoquinone had the highest content in West of Bingdao and the lowest content in Nannuoshan. The random forest algorithm achieved a discrimination accuracy of 98.4% based on the discrimination of five key compounds in 63 RPT samples. The random forest model was demonstrated to be reliable and valid by using receiver operating characteristic curves (area under the curve = 0.7603). The study results serve as a reference for the differentiation of 10 production areas of RPT.
food science & technology
What problem does this paper attempt to address?