Study on Raman Spectral Characteristics of Breast Cancer Based on Multivariable Spectral Data Analysis Methods

Zhang Bao-ping,Ning Tian,Zhang Fu-rong,Chen Yi-shen,Zhang Zhan-qin,Wang Shuang
DOI: https://doi.org/10.3964/j.issn.1000-0593(2023)02-0426-09
2023-01-01
Abstract:Compared to cell and sliced tissue samples, blood samples could be collected easier, and its biomedical constitution would show some relavant variations before clinical pathological symptoms. Raman spectroscopy provides molecular-related information about biomedical contents for clinical investigations in a rapid, nonlabeled, nondestructive and noninvasive way, presenting a significant application prospect for blood sample-based diagnosis. In this study, we present a reliable method for detecting breast cancer using blood serum combined with multivariate analysis methods. The blood serum samples were divided into healthy, early, and advanced cancer groups based on clinical pathological diagnosis. Using a quatz capillary tubes as sample holder, the spectral information was acquired to illustrating the biomedical constitution nature of the serum sample. The spectral classification models, which were built on the method of principal component analysis (PCA), linear discriminant analysis (LDA), supporting vector machines (SVM) and partial least squares discriminant analysis (PLS-DA), were utilized for unveiling the spectral variances among different investigated groups. And the leave-one-out cross-validation (LOOCV) method was adopted for evaluating the model classification performance. After that, we not only observed the resonance Raman spectral phenomena of carotenoid contents in serum but also identified the spectral variations of protein and lipid contents during breast cancer progression. By using the multivariate analysis methods, the representative spectral identities were recognized. Since then, the spectral classification accuracy of PCA-LDA model was found to be 99%. For three types kernel based PCA-SVM model, it was found that the linear kernel model reached 92% accuracy with parameter c=0.003, the classification accuracy of the RBF kernel model was 94% with parameter c=0.125 and gamma=256, and the polynomial model presented 92% accuracy with parameter c=0.003 and d=11. Meanwhile, the spectral classification accuracy of PLS-DA was 80%. The obtained results could pave a theoretical and experimental foundation for serum Raman spectroscopy-based breast cancer early screening and diagnosis.
What problem does this paper attempt to address?