A censored-Poisson model based approach to the analysis of RNA-seq data

Xing Chen,Yinglei Lai
DOI: https://doi.org/10.1007/s40484-020-0208-3
2020-06-01
Quantitative Biology
Abstract:<span class="a-plus-plus abstract-section id-a-sec1"><h3 class="a-plus-plus">Background</h3><p class="a-plus-plus">With the recent advance of sequencing technology, the collection of RNA expression (RNA-seq) data has been growing rapidly. RNA-seq data are statistically count-type measurements. Poisson distribution is a basic probability distribution for modeling count-type data. With Poisson regression models, various experimental factors, GC content as well as alternative splicing isoforms can be flexibly considered in RNA-seq data analysis. Due to the biochemical and technical limitations of sequencing technology, the biases among RNA-seq data have been recognized.</p></span><span class="a-plus-plus abstract-section id-a-sec2"><h3 class="a-plus-plus">Methods</h3><p class="a-plus-plus">In this study, an artificial censoring approach has been proposed to an isoform-specific Poisson regression model for analyzing RNA-seq data. Low expression values can be grouped (censored) into one probability category, and high expression values can also be grouped (censored) into another probability category. We have implemented the related Newton-Raphson numeric computing procedure to achieve the maximum likelihood estimation for our censored-Poisson regression model. The related mathematical simplifications have been derived for the consideration of stable and convenient numerical computing.</p></span><span class="a-plus-plus abstract-section id-a-sec3"><h3 class="a-plus-plus">Results</h3><p class="a-plus-plus">The advantages of our artificial censoring approach have been demonstrated in both simulation studies and application analysis of experimental data.</p></span><span class="a-plus-plus abstract-section id-a-sec4"><h3 class="a-plus-plus">Conclusions</h3><p class="a-plus-plus">Our proposed artificial censoring approach allows us to focus on the majority of data. As the extreme values (tails) of data are artificially censored, more efficient analysis results can be obtained, even from relatively simple Poisson regression models. Our proposed artificial censoring approach can certainly be considered for other well-developed models or methods for RNA-seq data analysis.</p></span>
mathematical & computational biology
What problem does this paper attempt to address?