SeedsGraph: an efficient assembler for next-generation sequencing data

Chunyu Wang,Maozu Guo,Xiaoyan Liu,Yang Liu,Quan Zou
DOI: https://doi.org/10.1186/1755-8794-8-S2-S13
2015-01-01
BMC Medical Genomics
Abstract:DNA sequencing technology has been rapidly evolving, and produces a large number of short reads with a fast rising tendency. This has led to a resurgence of research in whole genome shotgun assembly algorithms. We start the assembly algorithm by clustering the short reads in a cloud computing framework, and the clustering process groups fragments according to their original consensus long-sequence similarity. We condense each group of reads to a chain of seeds, which is a kind of substring with reads aligned, and then build a graph accordingly. Finally, we analyze the graph to find Euler paths, and assemble the reads related in the paths into contigs, and then lay out contigs with mate-pair information for scaffolds. The result shows that our algorithm is efficient and feasible for a large set of reads such as in next-generation sequencing technology.
What problem does this paper attempt to address?