Estimating and correcting index hopping misassignments in single-cell RNA-seq data

Lingling Miao,Loren Collado,Savannah Barkdull,Yoshine Saito,Jay-Hyun Jo,Jungmin Han,Stefania Dell’Orso,Michael C. Kelly,Sean Conlan,Heidi H. Kong,Isaac Brownell
DOI: https://doi.org/10.1101/2024.10.21.619353
2024-10-24
Abstract:Index hopping causes read assignment errors in data from multiplexed sequencing libraries. This issue has become more prevalent with the widespread use of high-capacity sequencers and highly multiplexed single-cell RNA sequencing (scRNA-seq) libraries. We conducted deep, plate-based scRNA-seq on a mixed population of mouse skin cells. Analysis of transcriptomes from 1152 cells identified four distinct cell types. To estimate the error rate in sample assignment due to index hopping, we employed differential expression analysis to identify signature genes that were highly and specifically expressed in each cell type. We quantified the proportion of misassigned reads by examining the detection rates of signature genes in other cell types. Remarkably, regardless of gene expression levels, we estimated that 0.65% of reads per gene were assigned to incorrect cell across our data. To computationally compensate for index hopping, we developed a simple correction method wherein, for each gene, 0.65% of the library’s average expression level was subtracted from the expression of each cell. This correction had notable effects on transcriptome analyses, including increased cell-cell clustering distance and alterations in intermediate state assignments of cell differentiation. These findings underscore the potential impact of index hopping on experimental results. In conclusion, we devised a straightforward method to estimate and correct for the index hopping rate by quantifying misassigned genes in distinct cell types within an scRNA-seq library. This approach can be applied to any barcoded, multiplexed scRNA-seq library containing cells with distinct expression profiles, allowing for correction of the expression matrix before conducting biological analysis.
Genomics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is **the mis - assignment of reads caused by index hopping in single - cell RNA sequencing (scRNA - seq) data**. Specifically, index hopping refers to the mis - assignment of sequence reads from one sample to another due to incorrect barcode (index) assignment during multiplex sequencing. This problem is particularly prominent in high - throughput sequencing and highly multiplexed single - cell RNA sequencing. ### Detailed Explanation 1. **Background and Problem Description**: - With the wide application of high - capacity sequencers and highly multiplexed single - cell RNA sequencing libraries, the phenomenon of index hopping has become more and more common. - Index hopping can lead to the mis - assignment of gene expression data, thus affecting the accuracy of experimental results. 2. **Research Methods**: - The authors used plate - based single - cell RNA sequencing technology to perform deep sequencing on mouse skin cells and identified four different cell types. - By analyzing the characteristic gene expression patterns of these cell types, the authors estimated the error rate of index hopping and developed a simple correction method to compensate for this error. 3. **Main Findings**: - Regardless of the gene expression level, approximately 0.65% of the reads were mis - assigned to other cells. - To correct the effect of index hopping, the authors proposed a computational method: for each gene, subtract 0.65% of the library - average expression level from the expression level of each cell. 4. **Impact and Application**: - This correction method significantly improved the results of transcriptome analysis, including increasing the inter - cell clustering distance and changing the assignment of cell differentiation intermediate states. - This method can be applied to any multiplexed single - cell RNA sequencing library containing cells with different expression profiles to correct the expression matrix, thereby improving the accuracy of biological analysis. ### Summary This paper aims to provide a simple and effective method to estimate and correct index hopping errors in single - cell RNA sequencing data, thereby improving the accuracy and reliability of data analysis.