Performance of feature extraction method for classification and identification of proteins based on three-dimensional fluorescence spectrometry

Jiwei Xu,Jianjie Xu,Zhaoyang Tong,Bin Du,Bing Liu,Xihui Mu,Tengxiao Guo,Siqi Yu,Shuai Liu,Chuan Gao,Jiang Wang,Zhiwei Liu,Pengjie Zhang
DOI: https://doi.org/10.1016/j.saa.2022.121841
2023-01-15
Abstract:Three-dimensional excitation emission matrix (EEM) fluorescence spectroscopy was employed to discriminate protein samples comprising bovine serum albumin, neurotensin, ovalbumin, ricin, trypsin from bovine pancreas and trypsin from porcine pancreas. Two methods of feature extraction with and without parameterization were applied to the spectral data in order to evaluate their performance of discrimination between protein samples. The discrimination of protein samples was conducted by k-means clustering algorithm and eigenvalue extracting procedure based on principal component analysis (PCA). It was found that the method of feature extraction without parameterization performed best, correctly attributing 100% of the spectral data in the condition of two principal components (PCs) captured. Features extracted with spectral parameterization failed to separate ricin and trypsin from bovine pancreas in same condition. Without spectral parameterization, less dimensionality and unique principal components captured by PCA indicates the spectrally-resolved features of corresponding protein samples. By clustering using each spectrum at fixed excitation wavelength, excitation wavelengths matched with common intrinsic fluorophores were found to be more sensitive to the classification accuracy. Contributions of spectral features extracted from EEM to the principal components were discussed and demonstrated their feature differentiation capabilities among six protein samples. These results reveal that appropriate extraction approach of features in combination with PCA analysis could be used in discrimination of protein samples at species level as a spectroscopic diagnostic tool. Our study provides fundamental references about computational strategies when EEM are used to explore proteins in ambient environment.
What problem does this paper attempt to address?