PBSeq: Modeling base-level bias to estimate gene and isoform expression for RNA-seq data

Li Zhang,Xuejun Liu
DOI: https://doi.org/10.1007/s13042-016-0497-z
2016-01-01
Abstract:Due to its unprecedented high-throughput and high-resolution, RNA-seq rapidly becomes a revolutionary and powerful technology for transcriptome analysis. However, RNA-seq library preparation results in non-uniformity of read distribution in the represented genes. When estimating gene and isoform expression level, the non-uniformity needs to be accounted and corrected to improve the estimation accuracy. In this paper, we propose PBSeq, a Poisson model utilizing a base-level bias correction strategy to estimate gene and isoform expression. The base-level bias correction strategy simultaneously considers the positional and sequence-specific biases at starting position of reads mapped to the genes of interest. The PBSeq not only provides the expression values but also estimates the uncertainty associated with expression estimation, which represents the variation across replicates and is useful for downstream analysis. We utilize a simulated dataset and three real RNA-seq datasets to validate the PBSeq model. Results show that PBseq can accurately estimate gene and isoform expression levels and is computationally efficient compared with other state-of-art methods.
What problem does this paper attempt to address?