Assembling sequences of DNA using an on-line algorithm based on DeBruijn graphs

Juan Manuel Ciro Restrepo,Andrés Felipe Zapata Palacio,Mauricio Toro
DOI: https://doi.org/10.48550/arXiv.1705.05105
2017-05-15
Data Structures and Algorithms
Abstract:The problem of assembling DNA fragments starting from imperfect strings given by a sequencer, classified as NP hard when trying to get perfect answers, has a huge importance in several fields, because of its relation with the possibility of detecting similarities between animals, dangerous pests in crops, and so on. Some of the algorithms and data structures that have been created to solve this problem are Needleman Wunsch algorithm, DeBruijn graphs and greedy algorithms working on overlaps graphs; these try to work out the problem from different approaches that give place to certain advantages and disadvantages to be discussed. In this article we first expose a summary of the research done on already created solutions for the DNA assembly problem, to present later an on-line solution to the same matter, which, despite not considering mutations, would have the capacity of using only the necessary amount of readings to assemble an user specified amount of genes.
What problem does this paper attempt to address?