Near Infrared Spectral Analysis Based on Data Augmentation Strategy and Convolutional Neural Network
Yun Zheng,Si-Yu Yang,Tao Wang,Zhuo-Wen Deng,Lan Wei-Jie,Yong-Huan Yun,Lei-Qing Pan
DOI: https://doi.org/10.19756/j.issn.0253-3820.241155
IF: 1.193
2024-01-01
Chinese Journal of Analytical Chemistry
Abstract:Near infrared spectroscopy (NIRS) technology combined with chemometrics algorithms has been widely used in quantitative and qualitative analysis of food and medicine. However, traditional chemometrics methods, especially linear classification methods, often yield unsatisfactory results when addressing multi- class classification problems. Convolutional neural network (CNN) is adept at extracting deep- level features from data and suitable for handling non-linear- linear relationships. The modeling performance of CNN depends on the size and diversity of sample, while the collection and preprocessing of NIRS sample data is often time-consuming- consuming and labor-- intensive. This study proposed a NIRS qualitative analysis method based on data augmentation strategies and CNN. The data augmentation strategy included two steps. Firstly, applying Bootstrap resampling and generative adversarial network (GAN) methods to augment three NIRS datasets (Medicine, coffee and grape). Secondly, combining the original samples (Y) with the Bootstrap augmented samples (B) and GAN augmented samples (G) to obtain three augmented datasets (Y- B, Y- G and Y- B- G). Based on this, a CNN model structure suitable for these datasets was designed, consisting of 2 one-dimensional- dimensional convolutional layers, 1 max- pooling layer, and 1 fully connected layer. The results showed that compared to the optimal models of partial least squares discriminant analysis (PLS-DA),- DA), support vector machine (SVM), and back propagation neural network (BP), the CNN model based on Y- B dataset achieved average accuracy improvements of 3.998%, 9.364%, and 4.689% for medicine (Binary classification); the CNN model based on the Y- B- G dataset achieved average accuracy improvements of 6.001%, 2.004%, and 7.523% for coffee (7- class classification); and the CNN model based on the Y- B dataset achieved average accuracy improvements of 33.408%, 51.994%, and 34.378% for grapes (20- class classification). It was evident that the models established based on data augmentation strategies and CNN demonstrated better classification accuracy and generalization performance with different datasets and classification categories.