P_RNA_scaffolder: a Fast and Accurate Genome Scaffolder Using Paired-End RNA-sequencing Reads

Bai-Han Zhu,Jun Xiao,Wei Xue,Gui-Cai Xu,Ming-Yuan Sun,Jiong-Tang Li
DOI: https://doi.org/10.1186/s12864-018-4567-3
IF: 4.547
2018-01-01
BMC Genomics
Abstract:Background Obtaining complete gene structures is one major goal of genome assembly. Some gene regions are fragmented in low quality and high-quality assemblies. Therefore, new approaches are needed to recover gene regions. Genomes are widely transcribed, generating messenger and non-coding RNAs. These widespread transcripts can be used to scaffold genomes and complete transcribed regions. Results We present P_RNA_scaffolder, a fast and accurate tool using paired-end RNA-sequencing reads to scaffold genomes. This tool aims to improve the completeness of both protein-coding and non-coding genes. After this tool was applied to scaffolding human contigs, the structures of both protein-coding genes and circular RNAs were almost completely recovered and equivalent to those in a complete genome, especially for long proteins and long circular RNAs. Tested in various species, P_RNA_scaffolder exhibited higher speed and efficiency than the existing state-of-the-art scaffolders. This tool also improved the contiguity of genome assemblies generated by current mate-pair scaffolding and third-generation single-molecule sequencing assembly. Conclusions The P_RNA_scaffolder can improve the contiguity of genome assembly and benefit gene prediction. This tool is available at http://www.fishbrowser.org/software/P_RNA_scaffolder .
What problem does this paper attempt to address?