deMULTIplex2: robust sample demultiplexing for scRNA-seq

Qin Zhu,Daniel N. Conrad,Zev J. Gartner
DOI: https://doi.org/10.1186/s13059-024-03177-y
IF: 17.906
2024-02-02
Genome Biology
Abstract:Sample multiplexing enables pooled analysis during single-cell RNA sequencing workflows, thereby increasing throughput and reducing batch effects. A challenge for all multiplexing techniques is to link sample-specific barcodes with cell-specific barcodes, then demultiplex sample identity post-sequencing. However, existing demultiplexing tools fail under many real-world conditions where barcode cross-contamination is an issue. We therefore developed deMULTIplex2, an algorithm inspired by a mechanistic model of barcode cross-contamination. deMULTIplex2 employs generalized linear models and expectation–maximization to probabilistically determine the sample identity of each cell. Benchmarking reveals superior performance across various experimental conditions, particularly on large or noisy datasets with unbalanced sample compositions.
genetics & heredity,biotechnology & applied microbiology
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily addresses the issue of sample demultiplexing in single-cell RNA sequencing (scRNA-seq) experiments. Specifically: 1. **Background**: - Single-cell sequencing technology has revolutionized biomedical research, providing high-resolution, high-throughput, unbiased analysis of healthy and diseased tissues. - In recent years, the development of various single-cell sample multiplexing techniques based on lipid-tag indexing, barcoded antibodies, chemical labeling, nuclear hashing, lentiviral infection, etc., has further enhanced the scalability of scRNA-seq, allowing for the mixed sequencing of multiple samples under different experimental conditions. - While these methods reduce experimental costs and batch effects, they require demultiplexing in the data to correctly assign each cell to its source sample. 2. **Existing Problems**: - In practical experiments, due to tag cross-contamination, differences in tag capture rates, and the inherent noise of single-cell sequencing technology, the signal-to-noise ratio varies significantly, complicating demultiplexing. - Existing demultiplexing tools perform poorly under many real-world conditions, especially in the presence of barcode cross-contamination. 3. **Solution**: - This paper introduces a new algorithm, deMULTIplex2, which models tag cross-contamination based on physical mechanisms and uses a generalized linear model (GLM-NB) and the expectation-maximization algorithm (EM) to probabilistically determine the sample identity of each cell. - Benchmarking with simulated and real data shows that deMULTIplex2 performs excellently under various experimental conditions, particularly in large-scale or noisy datasets. In summary, the paper aims to develop a more robust demultiplexing algorithm to address the prevalent issue of tag cross-contamination in single-cell sequencing experiments, thereby improving the accuracy and reliability of demultiplexing.