A Pipeline to Identify Novel 3’ UTRs and Widespread Intergenic Transcription by Combination of Polyadenylation Sites and RNA-seq Data

Hongjuan Fu,Yibo Zhuang,Xiaohui Wu,Guoli Ji
DOI: https://doi.org/10.1145/3431943.3432289
2020-01-01
Abstract:Recent genomic studies continue to uncover widespread occurrences of polyadenylation poly(A) sites in presumed intergenic regions, providing new opportunities to investigate the complex of 3’ untranslated regions and intergenic transcription. Here we developed a pipeline to detect novel 3’ UTRs, novel genes, and intergenic transcribed units by combining real and predicted poly(A) sites, archival ESTs, and RNA-seq data. Using data from Medicago truncatula, a model organism for legume biology, more than 3100 novel 3’ UTRs were identified, including 1932 normal 3’ UTRs averaging 1482 nt in length and 1261 distal 3’ UTRs each with elongation ≥ 5000 nt. Up to 632 novel genes and 12,765 intergenic transcribed units in previously uncharacterized intergenic regions were discovered. These new 3’ UTRs, novel genes, and intergenic transcribed units substantially extend the scope of plant transcriptome and should be incorporated into current M. truncatula genome annotation for more comprehensive genomic study, e.g., searching microRNA targets or other regulatory elements.
What problem does this paper attempt to address?