Abstract:Motivation: RNA-Seq technique has been demonstrated as a revolutionary means for exploring transcriptome because it provides deep coverage and base pair-level resolution. RNA-Seq quantification is proven to be an efficient alternative to Microarray technique in gene expression study, and it is a critical component in RNA-Seq differential expression analysis. Most existing RNA-Seq quantification tools require the alignments of fragments to either a genome or a transcriptome, entailing a time-consuming and intricate alignment step. To improve the performance of RNA-Seq quantification, an alignment-free method, Sailfish, has been recently proposed to quantify transcript abundances using all k-mers in the transcriptome, demonstrating the feasibility of designing an efficient alignment-free method for transcriptome quantification. Even though Sailfish is substantially faster than alternative alignment-dependent methods such as Cufflinks, using all k-mers in the transcriptome quantification impedes the scalability of the method. Results: We propose a novel RNA-Seq quantification method, RNA-Skim, which partitions the transcriptome into disjoint transcript clusters based on sequence similarity, and introduces the notion of sig-mers, which are a special type of k-mers uniquely associated with each cluster. We demonstrate that the sig-mer counts within a cluster are sufficient for estimating transcript abundances with accuracy comparable with any state-of-the-art method. This enables RNA-Skim to perform transcript quantification on each cluster independently, reducing a complex optimization problem into smaller optimization tasks that can be run in parallel. As a result, RNA-Skim uses <4% of the k-mers and <10% of the CPU time required by Sailfish. It is able to finish transcriptome quantification in <10 min per sample by using just a single thread on a commodity computer, which represents >100 speedup over the state-of-the-art alignment-based methods, while delivering comparable or higher accuracy. Availability and implementation: The software is available at http://www.csbio.unc.edu/rs. Contact: weiwang@cs.ucla.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Fleximer: Accurate Quantification of RNA-Seq via Variable-Length k-mers

RNA-Skim: a Rapid Method for RNA-Seq Quantification at Transcript Level.

Accurate quantification of single-cell and single-nucleus RNA-seq transcripts using distinguishing flanking k-mers

Transipedia.org: k-mer-based exploration of large RNA sequencing datasets and application to cancer data

Exploring a large cancer cell line RNA-sequencing dataset with k-mers

A Comprehensive Evaluation of Alignment Software for Reduced Representation Bisulfite Sequencing Data

Flexiplex: a versatile demultiplexer and search tool for omics data

saseR: Juggling offsets unlocks RNA-seq tools for fast and Scalable differential usage, Aberrant Splicing and Expression Retrieval.

Extraction of long k-mers using spaced seeds

Enhancing transcriptome expression quantification through accurate assignment of long RNA sequencing reads with TranSigner

Accurate isoform quantification by joint short- and long-read RNA-sequencing

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Enhancing RNA-seq analysis by addressing all co-existing biases using a self-benchmarking approach with 2D structural insights

Joint Estimation of Isoform Expression and Isoform-Specific Read Distribution Using Multisample RNA-Seq Data.

Enriched Methylomes of Low-input and Fragmented DNA Using Fragment Ligation EXclusive Methylation Sequencing (FLEXseq)

PennSeq: Accurate Isoform-Specific Gene Expression Quantification in RNA-Seq by Modeling Non-Uniform Read Distribution

Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM)

SASeq: A Selective and Adaptive Shrinkage Approach to Detect and Quantify Active Transcripts using RNA-Seq

Leveraging Basecaller's Move Table to Generate a Lightweight k-mer Model

Hyper-k-mers: efficient streaming k-mers representation