PATMAP: Polyadenylation Site Identification from Next-Generation Sequencing Data

Xiaohui Wu,Meishuang Tang,Junfeng Yao,Shuiyuan Lin,Zhe Xiang,Guoli Ji
DOI: https://doi.org/10.1007/978-3-642-28942-2_44
2012-01-01
Abstract:Polyadenylation is an essential post-transcriptional processing step in the maturation of eukaryotic mRNA. The coming flood of next-generation sequencing (NGS) data creates new opportunities for intensive study of polyadenylation. We present an automated flow called PATMAP to identify polyadenylation sites (poly(A) sites) by integrating NGS data cleaning, processing, mapping, normalizing and clustering. The ambiguous region was introduced to parse the genome annotation by first. Then a series of Perl scripts were seamlessly integrated to iteratively map the single-end or paired-end sequences to the reference genome. After mapping, the poly(A) tags (PATs) at the same coordinate were grouped into one cleavage site, and the internal priming artifacts were removed. Finally, these cleavage sites from different samples were normalized by a MA-based method and clustered into poly(A) clusters (PACs) by empirical Bayesian method. The effectiveness of PATMAP was demonstrated by identifying thousands of reliable PACs from millions of NGS sequences in Arabidopsis and yeast.
What problem does this paper attempt to address?