Geographical Origin Identification of Chinese Tomatoes Using Long-Wave Fourier-Transform Near-Infrared Spectroscopy Combined with Deep Learning Methods
Weidong Yuan,Hongzhe Jiang,Mengmeng Sun,Yu Zhou,Cong Zhang,Hongping Zhou
DOI: https://doi.org/10.1007/s12161-023-02444-1
IF: 2.9
2023-01-18
Food Analytical Methods
Abstract:Tomato cultivation in China is concentrated in Xinjiang, Henan, and Shandong provinces, and the quality of tomatoes produced in different regions shows huge differences. Xinjiang tomatoes have extraordinary nutritional compositions and sensory qualities, and counterfeit Xinjiang tomato products are proliferating in the market. This study aimed to investigate the feasibility of identifying the geographical origin of Chinese tomatoes by using long-wave Fourier-Transform near-infrared spectroscopy (FT-NIR, 10,000–4000 cm −1 ). First, principal component analysis (PCA) was conducted on the raw spectra, and it was found that the first two PCs effectively identified the geographical origin of the tomatoes. Meanwhile, the results of partial least squares discriminant analysis (PLS-DA) combined with different preprocessing methods showed that the PLS-DA model based on the raw spectra achieved the best performance. The optimal PLS-DA model achieved the correct classification rate (CCR) of 97.8% in an external prediction set, and showed that raw spectra contained sufficient valid information. Besides, six algorithms, grid search (GS), genetic algorithm (GA), particle swarm algorithm (PSO), grey wolf algorithm (GWO), improved grey wolf algorithm (IGWO), and sparrow search algorithm (SSA) were employed to optimize the parameters of support vector machine (SVM). The SSA-SVM model exhibited the best performance with a CCR of 97.8% in the prediction set. Afterward, 40 and 8 spectral features were extracted from the raw full spectra using stacked autoencoder (SAE) and PC loading, respectively. Finally, the qualitative analysis model based on the feature variables was further investigated, and the SAE-SSA-SVM simplified model showed the best performance with a CCR of 98.7%, 95.6%, and 95.6% in the calibration, cross-validation, and prediction set, respectively. The results of this study provide theoretical support for applying long-wave FT-NIRS combined with machine learning and deep learning to tomato origin identification.
food science & technology