Study on Multi-Wavelength Transmission Spectral Feature Extraction Combined With Support Vector Machine for Bacteria Identification
Feng Chun,Zhao Nan-jing,Yin Gao-fang,Gan Ting-ting,Chen Xiao-wei,Chen Min,Hua Hui,Duan Jing-bo,Liu Jian-guo
DOI: https://doi.org/10.3964/j.issn.1000-0593(2021)09-2940-05
2021-01-01
Spectroscopy and spectral analysis
Abstract:The realization of rapid identification of pathogenic bacteria has important practical significance for preventing large-scale disease outbreaks caused by microbial pollution in water bodies. Conventional bacterial detection methods such as biochemical identification and nucleic acid detection have the characteristics of time-consuming and precise experimental equipment, which are insufficient for the rapid and real-time online monitoring of bacteria. Since the multi-wavelength transmission spectrum of bacteria contains abundant characteristic information, and this spectral detection technology has the advantages of fast, simple, non-contact, and non-polluting, it has become a hot spot in bacterial detection research in recent years. This article takes Klebsiella pneumoniae, Staphylococcus aureus, Salmonella typhimurium, Pseudomonas aeruginosa and Escherichia coli as research objects. The characteristic wavelength range with the most significant spectral change is obtained by normalization and the analysis of variance method, and the characteristic spectral values such as the absorbance value at 200nm and the slope value of the short waveband are extracted from this range, and the support vector machine is used to predict different types of bacteria. The results show that the normalization of the multi-wavelength transmission spectrum can effectively eliminate the concentration effect and retain the complete original spectral information. The characteristic wavelength range of 200 similar to 300 nm is obtained by analysis of variance. The characteristic values of the normalized spectral trend graphs of the five bacteria extracted in this interval are : The absorbance values at 200 nm are 0. 006 5, 0. 005 1, 0. 007 5, 0. 007 5, and 0. 008 5. The slope values at the 200 similar to 245 nm band are -62. 45 , -35. 94 , 81. 30 , 82. 67 , and - 103. 49, and the slope values at the 250 similar to 275 nm band are - 15. 48, - 14. 82, - 20. 91, - 13. 92 and - 26. 21, the slope values at the 280 similar to 300 nm band are - 29. 96, - 24. 62, - 33. 71, - 36. 09 and - 30. 88, respectively. Feature values were extracted from the samples and randomly divided into a training sets and test sets. The penalty factor model and the linear kernel function were selected for SVM, the best penalty factor parameter c and kernel function parameter g were determined through the optimization algorithm. The prediction accuracy rates of the five species of bacteria all reach 100. 0%. In summary, theobvious spectral characteristic values of the multi-wavelength transmission spectrum of bacteriacan be extracted through proper data preprocessing. The spectral feature value combined with the support vector machine can be effectively used for the identification of different bacterial species. This method provides important technical support for rapid identification and real-time online monitoring of water bacteria.