scCensus: Off-target scRNA-seq reads reveal meaningful biology

Dongze He,Stephen M. Mount,Rob Patro
DOI: https://doi.org/10.1101/2024.01.29.577807
2024-01-31
Abstract:Single-cell RNA-sequencing (scRNA-seq) provides unprecedented insights into cellular heterogeneity. Although scRNA-seq reads from most prevalent and popular tagged-end protocols are expected to arise from the 3′ end of polyadenylated RNAs, recent studies have shown that “off-target” reads can constitute a substantial portion of the read population. in this work, we introduced , a comprehensive analysis workflow for systematically evaluating and categorizing off-target reads in scRNA-seq. We applied to seven scRNA-seq datasets. Our analysis of intergenic reads shows that these off-target reads contain information about chromatin structure and can be used to identify similar cells across modalities. Our analysis of antisense reads suggests that these reads can be used to improve gene detection and capture interesting transcriptional activities like antisense transcription. Furthermore, using splice-aware quantification, we find that spliced and unspliced reads provide distinct information about cell clusters and biomarkers, suggesting the utility of integrating signals from reads with different splicing statuses. Overall, our results suggest that off-target scRNA-seq reads contain underappreciated information about various transcriptional activities. These observations about yet-unexploited information in existing scRNA-seq data will help guide and motivate the community to improve current algorithms and analysis methods, and to develop novel approaches that utilize off-target reads to extend the reach and accuracy of single-cell data analysis pipelines.
Bioinformatics
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to explore the informational value of "off-target reads" in single-cell RNA sequencing (scRNA-seq) and proposes a comprehensive workflow, scCensus, to systematically evaluate and classify these off-target reads. #### Main Research Content: 1. **Prevalence of Off-Target Reads**: - The paper demonstrates that off-target reads constitute a significant proportion in multiple scRNA-seq datasets. - These off-target reads include antisense reads, intergenic reads, etc. 2. **Biological Significance of Off-Target Reads**: - Intergenic reads are associated with open chromatin regions (OCR) and can be used to identify similar cells under different modalities. - Antisense reads can enhance gene detection and reveal interesting transcriptional activities, such as antisense transcription. - Reads from different splicing states provide unique information about cell clusters and biomarkers, indicating the importance of integrating signals from different splicing states. 3. **scCensus Workflow**: - A Nextflow-based workflow, scCensus, is proposed to systematically classify off-target scRNA-seq reads into different genomic feature groups. - scRNA-seq reads are categorized into three types: sense intragenic, antisense intragenic, and intergenic reads. #### Results and Findings: 1. **Relationship Between Intergenic Reads and Open Chromatin Regions**: - Intergenic reads are enriched near open chromatin regions and can provide information about these regions. - OCR-associated reads can produce clustering results consistent with standard methods at low resolution. 2. **Correlation and Application of Sense and Antisense Reads**: - The quantification results of sense and antisense intragenic reads are highly correlated but not identical, suggesting that antisense reads can be used to improve gene detection. - Some antisense reads may originate from genuine antisense transcripts. 3. **Informational Value of Reads from Different Splicing States**: - Using splicing-aware quantification methods, it was found that clustering results among spliced, unspliced, and ambiguous matrices are somewhat consistent but also show informative differences. - Specific mature marker genes for certain cell types were found in each count matrix, further indicating that reads from different splicing states should be processed and analyzed separately and integrated later. ### Summary Through systematic analysis, this paper demonstrates that off-target scRNA-seq reads contain valuable biological information and suggests that the research community should expand current analytical methods to include these off-target fragments. It advocates for the development of new methods and technologies to utilize this information, thereby improving the coverage and accuracy of single-cell data analysis pipelines.