Abstract:In recent years, RNA-seq is emerging as a powerful technology in estimation of gene and/or transcript expression, and RPKM (Reads Per Kilobase per Million reads) is widely used to represent the relative abundance of mRNAs for a gene. In general, the methods for gene quantification can be largely divided into two categories: transcript-based approach and 'union exon'-based approach. Transcript-based approach is intrinsically more difficult because different isoforms of the gene typically have a high proportion of genomic overlap. On the other hand, 'union exon'-based approach method is much simpler and thus widely used in RNA-seq gene quantification. Biologically, a gene is expressed in one or more transcript isoforms. Therefore, transcript-based approach is logistically more meaningful than 'union exon'-based approach. Despite the fact that gene quantification is a fundamental task in most RNA-seq studies, however, it remains unclear whether 'union exon'-based approach for RNA-seq gene quantification is a good practice or not. In this paper, we carried out a side-by-side comparison of 'union exon'-based approach and transcript-based method in RNA-seq gene quantification. It was found that the gene expression levels are significantly underestimated by 'union exon'-based approach, and the average of RPKM from 'union exons'-based method is less than 50% of the mean expression obtained from transcript-based approach. The difference between the two approaches is primarily affected by the number of transcripts in a gene. We performed differential analysis at both gene and transcript levels, respectively, and found more insights, such as isoform switches, are gained from isoform differential analysis. The accuracy of isoform quantification would improve if the read coverage pattern and exon-exon spanning reads are taken into account and incorporated into EM (Expectation Maximization) algorithm. Our investigation discourages the use of 'union exons'-based approach in gene quantification despite its simplicity.

PennSeq: Accurate Isoform-Specific Gene Expression Quantification in RNA-Seq by Modeling Non-Uniform Read Distribution

Joint Estimation of Isoform Expression and Isoform-Specific Read Distribution Using Multisample RNA-Seq Data.

Improving the Diversity of Captured Full-Length Isoforms Using a Normalized Single-Molecule RNA-sequencing Method

Statistical Inferences for Isoform Expression in RNA-Seq

Isoform Abundance Inference Provides a More Accurate Estimation of Gene Expression Levels in RNA-seq.

A penalized likelihood approach for robust estimation of isoform expression

MSIQ: Joint modeling of multiple RNA-seq samples for accurate isoform quantification

Statistical Modeling of RNA-Seq Data

Accurate isoform quantification by joint short- and long-read RNA-sequencing

Estimation of Isoform Expression in Rna-Seq Data Using A Hierarchical Bayesian Model

Robust estimation of isoform expression with RNA-Seq data

BIISQ: Bayesian nonparametric discovery of Isoforms and Individual Specific Quantification

Evaluation and comparison of computational tools for RNA-seq isoform quantification

Modeling non-uniformity in short-read rates in RNA-Seq data

Using non-uniform read distribution models to improve isoform expression inference in RNA-Seq.

Union Exon Based Approach for Rna-Seq Gene Quantification: to Be or Not to Be?

Comprehensive Assessment of Isoform Detection Methods for Third-Generation Sequencing Data

Network-Based Isoform Quantification with RNA-Seq Data for Cancer Transcriptome Analysis

Rseqdiff: Detecting Differential Isoform Expression from RNA-Seq Data Using Hierarchical Likelihood Ratio Test.

Statistical modeling of isoform splicing dynamics from RNA-seq time series data

Degps is a Powerful Tool for Detecting Differential Expression in RNA-sequencing Studies