A hybrid demultiplexing strategy that improves performance and robustness of cell hashing

Lei Li,Jiayi Sun,Yanbin Fu,Siriruk Changrob,Joshua J C McGrath,Patrick C Wilson
DOI: https://doi.org/10.1093/bib/bbae254
IF: 9.5
2024-06-04
Briefings in Bioinformatics
Abstract:Cell hashing, a nucleotide barcode-based method that allows users to pool multiple samples and demultiplex in downstream analysis, has gained widespread popularity in single-cell sequencing due to its compatibility, simplicity, and cost-effectiveness. Despite these advantages, the performance of this method remains unsatisfactory under certain circumstances, especially in experiments that have imbalanced sample sizes or use many hashtag antibodies. Here, we introduce a hybrid demultiplexing strategy that increases accuracy and cell recovery in multi-sample single-cell experiments. This approach correlates the results of cell hashing and genetic variant clustering, enabling precise and efficient cell identity determination without additional experimental costs or efforts. In addition, we developed HTOreader, a demultiplexing tool for cell hashing that improves the accuracy of cut-off calling by avoiding the dominance of negative signals in experiments with many hashtags or imbalanced sample sizes. When compared to existing methods using real-world datasets, this hybrid approach and HTOreader consistently generate reliable results with increased accuracy and cell recovery.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in single - cell sequencing, the existing cell hashing methods have poor performance when dealing with unbalanced sample sizes or using multiple hash tags. Specifically, the paper proposes a hybrid demultiplexing strategy, aiming to improve the accuracy and cell recovery rate in multi - sample single - cell experiments. This method combines the results of cell hashing and genetic variation clustering, and can accurately and efficiently determine cell identities without increasing additional experimental costs. ### Main problems 1. **Background signal interference**: In actual datasets, a given hash tag may generate background signals, causing unlabeled cells to be misclassified as single - positive cells. 2. **Cross - contamination**: During the sample mixing process, hash tag spillover may cause a single cell to be positive for multiple hash tags, that is, false double - positive cells, and their frequency increases as the number of hash tags used increases. 3. **Improper staining conditions**: Inappropriate staining conditions sometimes lead to low efficiency of hash tags, resulting in a large number of cells without hash signals. ### Solutions 1. **Hybrid demultiplexing strategy**: Combine cell hashing and SNP - based demultiplexing methods to improve the accuracy and reliability of results through mutual verification. 2. **HTOreader tool**: Develop a new demultiplexing tool. By improving the threshold - calling algorithm, it avoids the dominance of negative signals when using multiple hash tags or when the sample size is unbalanced, thereby improving the accuracy of demultiplexing. ### Key technologies - **Data normalization**: Use CLR (centered log - ratio) and Log normalization methods to process the original counts. - **Mixture model**: Adopt a Gaussian mixture model to fit the normalized hash tag counts and distinguish between the background and true - positive groups. - **Threshold determination**: Calculate the means and standard deviations of the two Gaussian distributions to determine the optimal threshold. - **Sample identity assignment**: Determine the sample identity of each cell according to the combined results of hash tags and SNPs. ### Performance evaluation The paper shows a significant improvement in the accuracy and cell recovery rate of this hybrid strategy by comparing it with existing methods (such as MULTI - seq, GMM - Demux, DropletUtils, BFF_raw, and BFF_cluster) on multiple benchmark datasets and self - built datasets. In conclusion, by proposing a hybrid demultiplexing strategy and developing the HTOreader tool, this paper effectively solves the limitations of existing cell hashing methods in practical applications and improves the accuracy and reliability of multi - sample single - cell experiments.