DemoTape: Computational demultiplexing of targeted single-cell sequencing data

Nico Borgsmueller,Jack Kuipers,Johannes Gawron,Marco Roncador,Marcel Fabian Pohly,Erkin Acar,Thi Huong Lan Do,Stefanie Ute Reisenauer,Mirjam Judith Feldkamp,Christian Beisel,Thorsten Zenz,Andreas Moor,Niko Beerenwinkel
DOI: https://doi.org/10.1101/2024.12.06.627152
2024-12-10
Abstract:Single-cell sequencing can provide novel insights into the understanding and treatment of diseases. In cancer, for example, intratumor heterogeneity is a major cause of treatment resistance and relapse. Although technological progress in the last decade has substantially increased the throughput of sequenced cells, single-cell sequencing remains cost and labor-intensive. Multiplexing, i.e., the pooling and subsequent joint preparation and sequencing of samples, followed by a demultiplexing step, is a common practice to reduce expenses and confounding batch effects, especially in single-cell RNA sequencing. Here, we introduce demoTape, a computational demultiplexing method for targeted single-cell DNA sequencing (scDNA-seq) data, based on a distance metric between individual cells at single-nucleotide polymorphisms (SNPs) loci. To validate demoTape, we used the Tapestri platform to sequence three B-cell lymphoma patients separately and multiplexed. Using the three individually sequenced samples, we further simulated multiplexed ground truth data and show that demoTape outperforms state-of-the-art demultiplexing methods designed for RNA sequencing data. Additionally, we demonstrate through downsampling that the inferred clonal composition remained largely stable, despite the inevitable loss in resolution of low-frequency clones. The demultiplexing also revealed that up to 50% of the sequenced droplets contained two cells instead of one. Despite such high noise rates, we found similar genotypes, clones, and evolutionary histories in all three samples when comparing the individual with the demultiplexed samples. Multiplexing and subsequent genotype-based demultiplexing of scDNA-seq will therefore reduce costs and workload, eventually allowing the sequencing of more samples. This will open new possibilities and accelerate the investigation of biological questions where cellular heterogeneity on the genomic level plays a crucial role.
Bioinformatics
What problem does this paper attempt to address?