ctQC improves biological inferences from single cell and spatial transcriptomics data
Vairavan Lakshmanan,Merve Kahraman,Dominique Camat Macalinao,Nicole Gunn,Prasanna Nori Venkatesh,Chang Meihuan,Cherylin Fu,Leow Wei Qiang,Iain Beehuat Tan,Shyam Prabhakar
DOI: https://doi.org/10.1101/2024.05.23.594978
2024-05-28
Abstract:Quality control (QC) is the first critical step in single cell and spatial data analysis pipelines. QC is particularly important when analysing data from primary human samples, since genuine biological signals can be obscured by debris, perforated cells, cell doublets and ambient RNA released into the 'soup' by cell lysis. Consequently, several QC methods for single cell data, employ fixed or data-driven quality thresholds. While these approaches efficiently remove empty droplets, they often retain low-quality cells. Here, we propose cell type-specific QC (ctQC), a stringent, data-driven QC approach that adapts to cell type differences and discards soup and debris. Evaluating single cell RNA-seq data from colorectal tumors, human spleen, and peripheral blood mononuclear cells, we demonstrate that ctQC outperforms existing methods by improving cell type separation in downstream clustering, suppressing cell stress signatures, revealing patient-specific cell states, eliminating artefactual clusters and reducing ambient RNA artifacts. When applied to sequencing-based spatial RNA profiling data (Slide-seq), ctQC improved spatial coherence of cell clusters and consistency with anatomical structures. These results demonstrate that strict, data-driven, cell-type-specific QC is applicable to diverse sample types and substantially improves the quality and reliability of biological inferences from single cell and spatial RNA profiles.
Bioinformatics