Abstract:Cell hashing, a nucleotide barcode-based method that allows users to pool multiple samples and demultiplex in downstream analysis, has gained widespread popularity in single-cell sequencing due to its compatibility, simplicity, and cost-effectiveness. Despite these advantages, the performance of this method remains unsatisfactory under certain circumstances, especially in experiments that have imbalanced sample sizes or use many hashtag antibodies. Here, we introduce a hybrid demultiplexing strategy that increases accuracy and cell recovery in multi-sample single-cell experiments. This approach correlates the results of cell hashing and genetic variant clustering, enabling precise and efficient cell identity determination without additional experimental costs or efforts. In addition, we developed HTOreader, a demultiplexing tool for cell hashing that improves the accuracy of cut-off calling by avoiding the dominance of negative signals in experiments with many hashtags or imbalanced sample sizes. When compared to existing methods using real-world datasets, this hybrid approach and HTOreader consistently generate reliable results with increased accuracy and cell recovery.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in single - cell sequencing, the existing cell hashing methods have poor performance when dealing with unbalanced sample sizes or using multiple hash tags. Specifically, the paper proposes a hybrid demultiplexing strategy, aiming to improve the accuracy and cell recovery rate in multi - sample single - cell experiments. This method combines the results of cell hashing and genetic variation clustering, and can accurately and efficiently determine cell identities without increasing additional experimental costs. ### Main problems 1. **Background signal interference**: In actual datasets, a given hash tag may generate background signals, causing unlabeled cells to be misclassified as single - positive cells. 2. **Cross - contamination**: During the sample mixing process, hash tag spillover may cause a single cell to be positive for multiple hash tags, that is, false double - positive cells, and their frequency increases as the number of hash tags used increases. 3. **Improper staining conditions**: Inappropriate staining conditions sometimes lead to low efficiency of hash tags, resulting in a large number of cells without hash signals. ### Solutions 1. **Hybrid demultiplexing strategy**: Combine cell hashing and SNP - based demultiplexing methods to improve the accuracy and reliability of results through mutual verification. 2. **HTOreader tool**: Develop a new demultiplexing tool. By improving the threshold - calling algorithm, it avoids the dominance of negative signals when using multiple hash tags or when the sample size is unbalanced, thereby improving the accuracy of demultiplexing. ### Key technologies - **Data normalization**: Use CLR (centered log - ratio) and Log normalization methods to process the original counts. - **Mixture model**: Adopt a Gaussian mixture model to fit the normalized hash tag counts and distinguish between the background and true - positive groups. - **Threshold determination**: Calculate the means and standard deviations of the two Gaussian distributions to determine the optimal threshold. - **Sample identity assignment**: Determine the sample identity of each cell according to the combined results of hash tags and SNPs. ### Performance evaluation The paper shows a significant improvement in the accuracy and cell recovery rate of this hybrid strategy by comparing it with existing methods (such as MULTI - seq, GMM - Demux, DropletUtils, BFF_raw, and BFF_cluster) on multiple benchmark datasets and self - built datasets. In conclusion, by proposing a hybrid demultiplexing strategy and developing the HTOreader tool, this paper effectively solves the limitations of existing cell hashing methods in practical applications and improves the accuracy and reliability of multi - sample single - cell experiments.

A hybrid demultiplexing strategy that improves performance and robustness of cell hashing

Systematic benchmark of single-cell hashtag demultiplexing approaches reveals robust performance of a clustering-based method

demuxSNP: supervised demultiplexing single-cell RNA sequencing using cell hashing and SNPs

A superior strategy for single-cell mutational screening via multiplex-targeted QPCR using the BioMark HD microfluidic platform.

hadge: a comprehensive pipeline for donor deconvolution in single-cell studies

Demuxalot: scaled up genetic demultiplexing for single-cell sequencing

GMM-Demux: sample demultiplexing, multiplet detection, experiment planning, and novel cell-type verification in single cell sequencing

Sample-multiplexing approaches for single-cell sequencing

coherent genetic demultiplexing in single-cell and single-nuclei experiments

Demuxafy: improvement in droplet assignment by integrating multiple single-cell demultiplexing and doublet detection methods

Mitosplitter: A Mitochondrial Variants- Based Method for Efficient Demultiplexing of Pooled Single- Cell RNA- Seq

deMULTIplex2: robust sample demultiplexing for scRNA-seq

Robust and cost-efficient single-cell sequencing through combinatorial pooling

A feasible roadmap to identifying significant intercellular genomic heterogeneity in deep sequencing data

High efficiency error suppression for accurate detection of low-frequency variants

Multiplexing Methods for Simultaneous Large-Scale Transcriptomic Profiling of Samples at Single-Cell Resolution.

More cells, more doublets in highly multiplexed single-cell data

Overloading And unpacKing (OAK) - droplet-based combinatorial indexing for ultra-high throughput single-cell multiomic profiling

DNA Hash Pooling and its Applications

Improved ClickTags Enable Live-Cell Barcoding for Highly Multiplexed Single Cell Sequencing.

Hybridization and Amplification Rate Correction for Affymetrix SNP Arrays