A dynamic hashing approach to build the de bruijn graph for genome assembly

Kun Zhao,Weiguo Liu,Gerrit Voß,Wolfgang Wittig Müller-Wittig
DOI: https://doi.org/10.1109/TENCON.2013.6719008
2013-01-01
Abstract:The development of next-generation sequencing technologies has revolutionized the genome research and given rise to the explosive increase of DNA sequencing throughput. However, due to the continuing explosive growth of short-read database, these technologies face the challenges of short overlap and high throughput. The de Bruijn graph is particularly suitable for short-read assemblies, and its advantage is that the graph size will not be affected by the high redundancy of deep read coverage. With this character, the fragment assembly is cast as finding a path visiting every edge in the graph exactly once. In this paper, we present a new method to accelerate the genome assembly procedure. We have used a distributed dynamic hashing approach to construct the de Bruijn graph from short-read data. Evaluations using three paired-end datasets show that, our method outperforms previous parallel and distributed assemblers on a CPU cluster system.
What problem does this paper attempt to address?