Linear-time superbubble identification algorithm for genome assembly

Ljiljana Brankovic,Costas S. Iliopoulos,Ritu Kundu,Manal Mohamed,Solon P. Pissis,Fatima Vayani
DOI: https://doi.org/10.1016/j.tcs.2015.10.021
IF: 1.002
2016-01-01
Theoretical Computer Science
Abstract:DNA sequencing is the process of determining the exact order of the nucleotide bases of an individual's genome in order to catalogue sequence variation and understand its biological implications. Whole-genome sequencing techniques produce masses of data in the form of short sequences known as reads. Assembling these reads into a whole genome constitutes a major algorithmic challenge. Most assembly algorithms utilise de Bruijn graphs constructed from reads for this purpose. A critical step of these algorithms is to detect typical motif structures in the graph caused by sequencing errors and genome repeats, and filter them out; one such complex subgraph class is a so-called superbubble. In this paper, we propose an O(n+m)-time algorithm to detect all superbubbles in a directed acyclic graph with n vertices and m (directed) edges, improving the best-known O(mlog⁡m)-time algorithm by Sung et al.
computer science, theory & methods
What problem does this paper attempt to address?