PseudoLasso: leveraging read alignment in homologous regions to correct pseudogene expression estimates via RNASeq.

Chelsea J.-T. Ju,Zhuangtian Zhao,Wei Wang
DOI: https://doi.org/10.1145/2649387.2649447
2014-01-01
Abstract:ABSTRACTPseudogenes have long been considered to be nonfunctional segments in the genome, but recent studies have provided evidence to support their novel regulatory roles in biological processes. With the growing interests in pseudogene research, scientists rely on RNA sequencing technology to estimate expression level of pseudogenes at different tissues or cell lines. The major challenge of RNASeq on pseudogene quantification falls in the high sequence similarity between pseudogenes and their homologous parents. Reads can be ambiguously aligned to multiple homologous regions. In this article, we present PseudoLasso, a genome-wide approach to accurately estimate the abundance of pseudogenes and their parents, and correctly align reads to their origins. Our approach focuses on learning read alignment behaviors, and leveraging this knowledge for abundance estimation and alignment correction. Compared to the read count estimates reported by TopHat2, PseudoLasso is able to provide estimates with a reduced error rate of 10-fold.
What problem does this paper attempt to address?