Abstract:Typical high-throughput single-cell RNA-sequencing (scRNA-seq) analyses are primarily conducted by (pseudo)alignment, through the lens of annotated gene models, and aimed at detecting differential gene expression. This misses diversity generated by other mechanisms that diversify the transcriptome such as splicing and V(D)J recombination, and is blind to sequences missing from imperfect reference genomes. Here, we present sc-SPLASH, a highly efficient pipeline that extends our SPLASH framework for statistics-first, reference-free discovery to barcoded scRNA-seq (10x Chromium) and spatial transcriptomics (10x Visium); we also provide its optimized module for preprocessing and k-mer counting in barcoded data, BKC, as a standalone tool. sc-SPLASH rediscovers known biology including V(D)J recombination and cell-type-specific alternative splicing in human and trans-splicing in tunicate (Ciona) and when applied to spatial datasets, detects sequence variation including tumor-specific somatic mutation. In sponge (Spongilla) and tunicate (Ciona), we uncover secreted repeat proteins expressed in immune-type cells and regulated during development; the sponge genes were absent from the reference assembly. sc-SPLASH provides a powerful alternative tool for exploring transcriptomes that is applicable to the breadth of life's diversity.

sc-SPLASH provides ultra-efficient reference-free discovery in barcoded single-cell sequencing