Abstract:Abstract The accuracy of current deconvolution methods largely relies on the quality of cell-type expression references. However, single-cell (sc) and single-nuclei (sn) RNA-seq data used for building the reference are usually generated from independent studies that are distinct from the bulk RNA-seq data to be deconvolved. This study design inherently introduces technical confounding factors as unwanted variations, which is not fully addressed by current methods. To evaluate the impact of this variation on deconvolution accuracy, we generated a benchmark dataset where bulk and snRNA-seq profiling were performed from the same aliquot of single-nuclei that were extracted from 24 healthy retina samples. All donor eye samples were collected within six hours post-mortem and were absent of any disease. This study design guarantees the matched sequencing data to present the same cell-type compositions, so that cross-platform technical artifacts become the remaining confounding factor. We used the benchmark dataset to evaluate the performance of seven current deconvolution methods and found they performed much worse in matched real-bulk data than in matched pseudo-bulks that were summations of the single-cell data. This finding suggests that none of these methods have fully addressed the major technical artifacts between bulk and single-cell sequencing platforms. We therefore propose DeMix.SC, a new deconvolution framework that optimizes deconvolution parameters using a small set of matched bulk and sc/snRNA-seq data from the same tissue type. DeMix.SC includes two major steps. First, we measure the technical variations across genes and across platforms using the benchmark data. Second, we introduce a new weight function for each gene that produces a ranking order that accounts for both the platform-specific technical variations and cell-type specific expressions at gene level. Using the benchmark data for retina, we applied DeMix.SC to previously published human retinal RNA-seq data from 523 individuals with different stages of age-related macular degeneration (AMD). We observed that DeMix.SC can accurately capture the cell-type composition shifts in the AMD retina. DeMix.SC revealed a significant drop of rod cells as well as increased astrocytes, bipolar cells, and Müller cells in the AMD retina compared to the non-AMD group. The proportion changes of the later three minor cell types were not identified by other methods, while DeMix.SC could reveal such tendency. In summary, DeMix.SC integrates benchmark data to improve the deconvolution accuracy in retina samples. Our method is generic and can be applied to other disease conditions, such as deciphering the cell-type heterogeneity in cancer. We expect DeMix.SC will help revolutionize the downstream cell-type specific analysis of bulk RNA-seq data and identify cellular targets of human diseases. Citation Format: Shuai Guo, Xuesen Cheng, Andrew Koval, Shuangxi Ji, Qingnan Liang, Yumei Li, Leah A. Owen, Ivana K. Kim, John Weinstein, Scott Kopetz, John Paul Shen, Margaret M. DeAngelis, Rui Chen, Wenyi Wang. Integration with benchmark data of paired bulk and single-cell RNA sequencing data substantially improves the accuracy of bulk tissue deconvolution. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 4273.

A robust workflow to benchmark deconvolution of multi-omic data

An updated State-of-the-Art Overview of transcriptomic Deconvolution Methods

Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software

Robust and accurate estimation of cellular fraction from tissue omics data via ensemble deconvolution

Benchmarking second-generation methods for cell-type deconvolution of transcriptomic data

Challenges and opportunities to computationally deconvolve heterogeneous tissue with varying cell sizes using single cell RNA-sequencing datasets

Heterogeneous pseudobulk simulation enables realistic benchmarking of cell-type deconvolution methods

Challenges and perspectives in computational deconvolution of genomics data

Deconer: A comprehensive and systematic evaluation toolkit for reference-based cell type deconvolution algorithms using gene expression data

CATD: A reproducible pipeline for selecting cell-type deconvolution methods across tissues

Benchmarking and integration of methods for deconvoluting spatial transcriptomic data

Spotless: a reproducible pipeline for benchmarking cell type deconvolution in spatial transcriptomics

Transcriptome Deconvolution of Heterogeneous Tumor Samples with Immune Infiltration

Deconvolution of omics data in Python with Deconomix -- cellular compositions, cell-type specific gene regulation, and background contributions

Computational deconvolution of transcriptomics data from mixed cell populations

Cell Type-Specific Deconvolution Of Heterogeneous Tumor Samples With Immune Infiltration Using Expression Data

DeCovarT, a multidimensional probalistic model for the deconvolution of heterogeneous transcriptomic samples

Abstract 4273: Integration with benchmark data of paired bulk and single-cell RNA sequencing data substantially improves the accuracy of bulk tissue deconvolution

Systematic evaluation of cell-type deconvolution pipelines for sequencing-based bulk DNA methylomes

AutoGeneS: Automatic Gene Selection Using Multi-Objective Optimization for RNA-seq Deconvolution