UNAGI: Yeast Transcriptome Reconstruction and Gene Discovery Using Nanopore Sequencing

Mohamad Al Kadi,Nicolas Jung,Daisuke Okuzaki
DOI: https://doi.org/10.1007/978-1-0716-2257-5_6
Abstract:Computational approaches are the main approaches used in genome annotation. However, accuracy is low. Untranslated regions are not identified, complex isoforms are not predicted correctly and discovery rate of noncoding RNA is low. RNA-seq has revolutionized transcriptome reconstruction over the last decade. However, fragmentation included in cDNA sequencing leads to information loss, requiring transcripts to be assembled and reconstructed, thus affecting the accuracy of reconstructed transcriptome. Recently, long-read sequencing has been introduced with technologies such as Oxford Nanopore sequencing. cDNA is sequenced directly without fragmentation producing long reads that don't need to be assembled keeping the transcript structure intact and increasing the accuracy of transcriptome reconstruction.Here we present a protocol and a pipeline to reconstruct the transcriptome of compact genomes including yeasts. It involves generating full-length cDNA and using Oxford Nanopore ligation-based sequencing kit to sequence multiple samples in the same run. The pipeline (1) strands the generated long reads, (2) corrects the reads by mapping them to the reference genome, (3) identifies transcripts including 5'UTR and 3'UTR, (4) profiles the isoforms, filtering out artifacts resulting from low accuracy in sequencing, and (5) improves accuracy of provided annotations. Using long reads improves the accuracy of transcriptome reconstruction and helps in discovering a significant number of novel RNAs.
What problem does this paper attempt to address?