A feasible roadmap for unsupervised deconvolution of two-source mixed gene expressions

Niya Wang,Eric P. Hoffman,Robert Clarke,Zhen Zhang,David M. Herrington,Ie-Ming Shih,Douglas A. Levine,Guoqiang Yu,Jianhua Xuan,Yue Wang
DOI: https://doi.org/10.48550/arXiv.1310.7033
IF: 5.414
2013-10-25
Machine Learning
Abstract:Tissue heterogeneity is a major confounding factor in studying individual populations that cannot be resolved directly by global profiling. Experimental solutions to mitigate tissue heterogeneity are expensive, time consuming, inapplicable to existing data, and may alter the original gene expression patterns. Here we ask whether it is possible to deconvolute two-source mixed expressions (estimating both proportions and cell-specific profiles) from two or more heterogeneous samples without requiring any prior knowledge. Supported by a well-grounded mathematical framework, we argue that both constituent proportions and cell-specific expressions can be estimated in a completely unsupervised mode when cell-specific marker genes exist, which do not have to be known a priori, for each of constituent cell types. We demonstrate the performance of unsupervised deconvolution on both simulation and real gene expression data, together with perspective discussions.
What problem does this paper attempt to address?