Evaluation of Filtering Effects of Orthogonal Signal Correction on Metabonomic Analysis of Healthy Human Serum H-1 NMR Spectra
Mao Hai-Lei,Xu Min,Wang Bin,Wang Hui-Min,Deng Xiao-Ming,Lin Dong-Hai
DOI: https://doi.org/10.3321/j.issn:0567-7351.2007.02.012
2007-01-01
Abstract:Three different pattern recognition methods before and after orthogonal signal correction (OSC) were employed to perform the metabonomic analysis of H-1 NMR spectra recorded on healthy human sera, in order to explore the potential of applying H-1 NMR-based metabonomics to clinical research. At first, 78 healthy human sera were collected after a routine fasting for 8 h, and the corresponding 1D H-1 NMR spectra were recorded on a Varian Unity INOVA-600 spectrometer, and then three pattern recognition analyses, PCA (principal component analysis), PLS-DA (partial least squares-discriminant analysis), and SIMCA (soft independent modeling of class analogy), were performed, respectively. In spite of no specific sample-collecting restriction on foods, life styles, and physiological cycles, the PLS-DA method after OSC is able to distinguish the NMR metabonomic profiles of male sera from those of female sera, more perfectly than both the PCA and SIMCA. Furthermore, the major NMR integral regions relevant to gender classification from PLS-DA after OSC were identical with those from PLS-DA without OSC filter in the literature. In the figure of displaying the variation of PLS-DA model before OSC and after removing different OSC latent variables (LVs), the eigenvalues of the first and second OSC-removed LVs were much greater than others. After removing two LVs by OSC, the remaining sum of square (RSS) in the X block was 20.82%, that is, 79.18% information unrelated to Y was removed from the PLS-DA model. Meanwhile, the LV number of PLS-DA model attained to one; while the LV number was two for the model with the first LV being removed by OSC, and three for the model without OSC. R2X, R2Y, and Q2 (cum) are usually used to evaluate the quality of PLS-DA model. R2X and R2Y are the fraction of the sum of square of the entire X's and Y's explained by the current LV of PLS-DA, and represent the variance of X and Y variables, respectively; while; Q2 is cross validated R2. Q2 (cum) reflects the cumulative cross-validated percent of the total variation of the X's and Y's that can be predicted by the current LV of PLS-DA model. In our study, after OSC filtering. the first two LVs, R2X reached the minimum, suggesting that the least systematic variance should be present in the current PLS-DA model. Meanwhile, both R2Y and Q2 (cum) were always higher than 80%, indicative of die good quality of the PLS-DA model. Obviously, OSC is capable of eliminating the influence of dietary, and environmental factors, and decreasing the heterogeneity of samples, which is fairly useful and important for clinical investigations. Additionally, the appropriate number of OSC-removed LVs should be determined on the basis of RSS in the X block, eigenvalue of OSC-removed latent variables, LV number and the qualitative indicators of the PLS-DA model.