Nanopore Sequencing Technology and Tools for Genome Assembly: Computational Analysis of the Current State, Bottlenecks and Future Directions

Damla Senol Cali,Jeremie S. Kim,Saugata Ghose,Can Alkan,Onur Mutlu
DOI: https://doi.org/10.1093/bib/bby017
2018-03-06
Abstract:Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages, and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we 1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and 2) provide guidelines for determining the appropriate tools for each step. We analyze various combinations of different tools and expose the tradeoffs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, in order to overcome the high error rates of the nanopore sequencing technology.
Genomics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges of nanopore sequencing technology in generating accurate genome assemblies, especially the high error rate problem. Nanopore sequencing technology has the potential to replace other sequencing technologies because of its ability to generate long reads and portability, but its high error rate poses a challenge to generating accurate genome assemblies. Therefore, the goal of the paper is to conduct a comprehensive analysis of currently publicly available nanopore sequence analysis tools to understand the advantages, disadvantages, and performance bottlenecks of these tools. Through this analysis, researchers hope to identify the shortcomings of existing tools in order to develop better tools to overcome the high error rate problem of nanopore sequencing technology. Specifically, the paper focuses on the following aspects: 1. **Selection of basecalling tools**: Basecalling is a crucial step in overcoming the high error rate of nanopore sequencing technology. The paper evaluates the performance of different basecalling tools, such as Metrichor, Nanonet, Scrappie, Nanocall, and DeepNano, to determine which tools can more effectively reduce the error rate. 2. **Performance of read - to - read overlap finding tools**: The paper compares the accuracy and performance of two tools, GraphMap and Minimap, in read - to - read overlap detection, and finds that although they perform similarly in terms of accuracy, Minimap is superior in terms of memory usage and speed. 3. **Trade - offs of assembly tools**: The paper explores the trade - offs between accuracy and performance when selecting assembly tools. For example, Miniasm is fast but has lower accuracy, and is suitable for rapid preliminary assembly, and the accuracy can be improved through further polishing later. 4. **Effect of polishing tools**: The paper evaluates the ability of the state - of - the - art polishing tool Racon to generate high - quality consensus sequences while providing significant acceleration, and compares it with another polishing tool Nanopolish. Through these analyses, the paper aims to provide guidance for researchers and practitioners to help them make informed and effective tool selections when using nanopore sequence data for genome assembly. At the same time, the paper also points out the bottlenecks of current tools, providing directions for developers to improve existing tools or build new, both accurate and fast tools.