RResolver: efficient short-read repeat resolution within ABySS

Vladimir Nikolić,Amirhossein Afshinfard,Justin Chu,Johnathan Wong,Lauren Coombe,Ka Ming Nip,René L. Warren,Inanç Birol
DOI: https://doi.org/10.1186/s12859-022-04790-z
IF: 3.307
2022-06-24
BMC Bioinformatics
Abstract:De novo genome assembly is essential to modern genomics studies. As it is not biased by a reference, it is also a useful method for studying genomes with high variation, such as cancer genomes. De novo short-read assemblers commonly use de Bruijn graphs, where nodes are sequences of equal length k , also known as k-mers. Edges in this graph are established between nodes that overlap by bases, and nodes along unambiguous walks in the graph are subsequently merged. The selection of k is influenced by multiple factors, and optimizing this value results in a trade-off between graph connectivity and sequence contiguity. Ideally, multiple k sizes should be used, so lower values can provide good connectivity in lesser covered regions and higher values can increase contiguity in well-covered regions. However, current approaches that use multiple k values do not address the scalability issues inherent to the assembly of large genomes.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?