IsoTree: De Novo Transcriptome Assembly from RNA-Seq Reads

Jin Zhao,Haodi Feng,Daming Zhu,Chi Zhang,Ying Xu
DOI: https://doi.org/10.1007/978-3-319-59575-7_7
2017-01-01
Abstract:High-throughput sequencing of mRNA has made the deep and efficient probing of transcriptomes more affordable. However, the vast amounts of short RNA-seq reads make de novo transcriptome assembly an algorithmic challenge. In this work, we present IsoTree, a novel framework for transcripts reconstruction in the absence of reference genomes. Unlike most of de novo assembly methods that build de Bruijn graph or splicing graph by connecting k-mers which are sets of overlapping substrings generated from reads, IsoTree constructs splicing graph by connecting reads directly. For each splicing graph, IsoTree applies an iterative scheme of mixed integer linear program to build a prefix tree, called isoform tree. Each path from the root node of the isoform tree to a leaf node represents a plausible transcript candidate which will be pruned based on the information of pair-end reads. Experiments showed that IsoTree performs better in recall on both pair-end reads and singleend reads and in precision on pair-end reads compared to other leading transcript assembly programs including Cufflinks, StringTie and Bin-Packer.
What problem does this paper attempt to address?