Data mining of near and middle infrared spectroscopy for determination of dry matter content of tea based on wavelet packet transform

Xiaoli Li,Yong He
2011-01-01
Abstract:This manuscript aims to detect the relationship between infrared spectroscopy and dry matter content of tea through data mining and regression analysis techniques. A total of 576 intact tea samples without grinded were collected for diffuse reflectance spectra scanning in the range of 7500-400 cm -1 based on a Fourier transform infrared spectrometer. And, the dry matter content was measured according to the Chinese national standard GB 8303-87. Firstly, spectral pre-treatment was adopted for elimination of baseline drift and other disturbance caused by irregular size of sample. Then data mining algorithm of wavelet packet transform (WPT) was employed to segregate noise from spectral data and to extract characteristic information with low dimensionality. For WPT, data mining was realized by a novel feature optimization approach which was proposed based on the structure of WPT and statistical analysis (WPT-SA). Following, four sets of wavelet packet coefficients were generated based on wavelet function db1 at the second scale. At last, regression models were respectively established based on the original spectral data and the wavelet packet coefficients. Results showed that the model based on the packet (2 0) coefficient outperformed the other models, and the optimal regression model obtained high R 2 of 0.9075, and low root mean square error of 0.0715. These results indicated that it is feasible to measure dry matter content of tea based on near and middle infrared spectroscopy, and the proposed feature optimization approach of WPT-SA is an effective data mining algorithm for enhancing the capability of spectral measurement.
What problem does this paper attempt to address?