Robust and accurate estimation of cellular fraction from tissue omics data via ensemble deconvolution

Manqi Cai,Molin Yue,Tianmeng Chen,Jinling Liu,Erick Forno,Timothy Billiar,Juan Celedón,Chris McKennan,Wei Chen,Jiebiao Wang,Xinhua Lu
DOI: https://doi.org/10.1093/bioinformatics/btac279
IF: 5.8
2022-04-19
Bioinformatics
Abstract:Abstract Motivation Tissue-level omics data such as transcriptomics and epigenomics are an average across diverse cell types. To extract cell-type-specific (CTS) signals, dozens of cellular deconvolution methods have been proposed to infer cell-type fractions from tissue-level data. However, these methods produce vastly different results under various real data settings. Simulation-based benchmarking studies showed no universally best deconvolution approaches. There have been attempts of ensemble methods, but they only aggregate multiple single-cell references or reference-free deconvolution methods. Results To achieve a robust estimation of cellular fractions, we proposed EnsDeconv (Ensemble Deconvolution), which adopts CTS robust regression to synthesize the results from eleven single deconvolution methods, ten reference datasets, five marker gene selection procedures, five data normalizations, and two transformations. Unlike most benchmarking studies based on simulations, we compiled four large real datasets of 4,937 tissue samples in total with measured cellular fractions and bulk gene expression from different tissues. Comprehensive evaluations demonstrated that EnsDeconv yields more stable, robust, and accurate fractions than existing methods. We illustrated that EnsDeconv estimated cellular fractions enable various CTS downstream analyses such as differential fractions associated with clinical variables. We further extended EnsDeconv to analyze bulk DNA methylation data. Availability EnsDeconv is freely available as an R-package from https://github.com/randel/EnsDeconv. Supplementary information Supplementary data are available at Bioinformatics online.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?