Abstract:Abstract The accuracy of current deconvolution methods largely relies on the quality of cell-type expression references. However, single-cell (sc) and single-nuclei (sn) RNA-seq data used for building the reference are usually generated from independent studies that are distinct from the bulk RNA-seq data to be deconvolved. This study design inherently introduces technical confounding factors as unwanted variations, which is not fully addressed by current methods. To evaluate the impact of this variation on deconvolution accuracy, we generated a benchmark dataset where bulk and snRNA-seq profiling were performed from the same aliquot of single-nuclei that were extracted from 24 healthy retina samples. All donor eye samples were collected within six hours post-mortem and were absent of any disease. This study design guarantees the matched sequencing data to present the same cell-type compositions, so that cross-platform technical artifacts become the remaining confounding factor. We used the benchmark dataset to evaluate the performance of seven current deconvolution methods and found they performed much worse in matched real-bulk data than in matched pseudo-bulks that were summations of the single-cell data. This finding suggests that none of these methods have fully addressed the major technical artifacts between bulk and single-cell sequencing platforms. We therefore propose DeMix.SC, a new deconvolution framework that optimizes deconvolution parameters using a small set of matched bulk and sc/snRNA-seq data from the same tissue type. DeMix.SC includes two major steps. First, we measure the technical variations across genes and across platforms using the benchmark data. Second, we introduce a new weight function for each gene that produces a ranking order that accounts for both the platform-specific technical variations and cell-type specific expressions at gene level. Using the benchmark data for retina, we applied DeMix.SC to previously published human retinal RNA-seq data from 523 individuals with different stages of age-related macular degeneration (AMD). We observed that DeMix.SC can accurately capture the cell-type composition shifts in the AMD retina. DeMix.SC revealed a significant drop of rod cells as well as increased astrocytes, bipolar cells, and Müller cells in the AMD retina compared to the non-AMD group. The proportion changes of the later three minor cell types were not identified by other methods, while DeMix.SC could reveal such tendency. In summary, DeMix.SC integrates benchmark data to improve the deconvolution accuracy in retina samples. Our method is generic and can be applied to other disease conditions, such as deciphering the cell-type heterogeneity in cancer. We expect DeMix.SC will help revolutionize the downstream cell-type specific analysis of bulk RNA-seq data and identify cellular targets of human diseases. Citation Format: Shuai Guo, Xuesen Cheng, Andrew Koval, Shuangxi Ji, Qingnan Liang, Yumei Li, Leah A. Owen, Ivana K. Kim, John Weinstein, Scott Kopetz, John Paul Shen, Margaret M. DeAngelis, Rui Chen, Wenyi Wang. Integration with benchmark data of paired bulk and single-cell RNA sequencing data substantially improves the accuracy of bulk tissue deconvolution. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 4273.

CATD: A reproducible pipeline for selecting cell-type deconvolution methods across tissues

Robust and accurate estimation of cellular fraction from tissue omics data via ensemble deconvolution

A Novel Computational Complete Deconvolution Method Using RNA-seq Data

Computational deconvolution of transcriptomics data from mixed cell populations

Challenges and opportunities to computationally deconvolve heterogeneous tissue with varying cell sizes using single cell RNA-sequencing datasets

An updated State-of-the-Art Overview of transcriptomic Deconvolution Methods

Deconer: A comprehensive and systematic evaluation toolkit for reference-based cell type deconvolution algorithms using gene expression data

CDSeq: A Novel Complete Deconvolution Method for Dissecting Heterogeneous Samples Using Gene Expression Data.

Benchmarking second-generation methods for cell-type deconvolution of transcriptomic data

SCDC: Bulk Gene Expression Deconvolution by Multiple Single-Cell RNA Sequencing References

CDSeqR: Fast Complete Deconvolution for Gene Expression Data from Bulk Tissues.

Fourteen years of cellular deconvolution: methodology, applications, technical evaluation and outstanding challenges

Spotless: a reproducible pipeline for benchmarking cell type deconvolution in spatial transcriptomics

Computational de novo discovery of distinguishing genes for biological processes and cell types in complex tissues

Missing cell types in single-cell references impact deconvolution of bulk data but are detectable

AutoGeneS: Automatic Gene Selection Using Multi-Objective Optimization for RNA-seq Deconvolution

Abstract 4273: Integration with benchmark data of paired bulk and single-cell RNA sequencing data substantially improves the accuracy of bulk tissue deconvolution

Spotless, a reproducible pipeline for benchmarking cell type deconvolution in spatial transcriptomics

Systematic Evaluation of Cell Type Deconvolution Methods for Plasma Cell-free DNA

A comprehensive assessment of cell type-specific differential expression methods in bulk data.

A robust workflow to benchmark deconvolution of multi-omic data