Abstract:Abstract Genome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there are very few choices for assembling Rmap data. There exists only one publicly-available non-proprietary method for assembly and one proprietary software that is available via an executable. Furthermore, the publicly-available method, by Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006), follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics’ Solve, is largely unknown. In this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to as rmapper , and compare its performance against the assembler of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) and Solve by Bionano Genomics on data from three genomes: E. coli , human, and climbing perch fish ( Anabas Testudineus ). Our method was able to successfully run on all three genomes. The method of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) only successfully ran on E. coli . Moreover, on the human genome rmapper was at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies. Our software, rmapper is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/Rmapper .

Fast construction of FM-index for long sequence reads

Pfp-fm: an accelerated FM-index

Efficient Construction and Utilization of k-Ordered FM-indexes with kISS for Ultra-Fast Read Mapping in Large Genomes

Acceleration of FM-index Queries Through Prefix-free Parsing

BWT construction and search at the terabase scale

A Memory-Efficient FM-Index Constructor for Next-Generation Sequencing Applications on FPGAs

A novel fast multiple nucleotide sequence alignment method based on FM-index

Nucleotide String Indexing using Range Matching

FM-Indexing Grammars Induced by Suffix Sorting for Long Patterns

Fast and accurate long-read alignment with Burrows–Wheeler transform

Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences

b-move: faster bidirectional character extensions in a run-length compressed index

CIndex: compressed indexes for fast retrieval of FASTQ files

Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph

An efficient Burrows-Wheeler transform-based aligner for short read mapping

Fast and Accurate Read Alignment for Resequencing.

String Partition for Building Long BWTs

μ- PBWT: a lightweight r-indexing of the PBWT for storing and querying UK Biobank data

Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index

Movi: a fast and cache-efficient full-text pangenome index

Building a pangenome alignment index via recursive prefix-free parsing