Nonnegative matrix factorization for the improvement in sensitivity of discovering potentially disease-related genes
Juan Zhang,Lifang Zhang,Gang Yang,Di Wu,Lina Jiang,Liqiu Huang,Zhining Wen,Menglong Li
DOI: https://doi.org/10.1016/j.chemolab.2013.05.004
IF: 4.175
2013-01-01
Chemometrics and Intelligent Laboratory Systems
Abstract:Nowadays DNA microarray technology is widely used in clinical researches for generating gene expression profiles from the biological samples. Based on the gene expression data, identifying differentially expressed genes (DEGs) from two groups of phenotypes or distinct biological conditions is one of the crucial steps in the procedure of discovering disease biomarkers. However, the clinical samples usually contain multiple cell types. This heterogeneous cell population significantly affects the gene expression patterns and will mask the biological difference between two groups of compared samples. Using mixed gene expression profile of multiple cell types instead of that of interested cell type for the identification of DEGs will seriously decrease the sensitivity of discovering the disease-related genes. Therefore, we proposed nonnegative matrix factorization (NMF), an unsupervised learning method that has been successfully applied in bioinformatics researches, for extracting the actual gene expression profile of interested cell type from the mixed profile of heterogeneous cell population. In our study, we firstly evaluated the performance of NMF algorithm in the deconvolution of gene expression data by using a well-controlled data set comprising the gene expression profiles from three tissues and eleven different mixtures with known proportions. Then, NMF was applied to the human whole-blood gene expression data generated from 24 kidney transplant recipients for estimating the pure gene expression profiles of five major blood cells, which were subsequently used to identify the genes related to the acute rejection of kidney transplant. The results showed that the number of DEGs (probe sets), which were identified from each of the gene expression profiles of five blood cells between stable post-transplant kidney transplant recipients and those experiencing acute transplant rejections, was greater than that from whole-blood samples. Finally, the DEGs were uploaded to the Gene Set Enrichment Analysis (GSEA) for the enrichment of signaling pathways and gene ontology terms. We found that several enriched pathways and gene ontology terms were significantly associated with renal transplantation rejection when the uploaded DEGs were identified from the two high content blood cells, while none of pathways and gene ontology terms was enriched when the uploaded DEGs were identified from whole-blood samples. Our results indicated that using the gene expression profile of specific cell type deconvoluted by NMF can efficiently increase the sensitivity of discovering potentially disease-related genes. In addition, this unsupervised method can evaluate the pure gene expression profile of specific cell type from the mixtures with no prior knowledge of cell proportions.