Abstract:Abstract Long-read-only bacterial genome assemblies usually contain residual errors, most commonly homopolymer-length errors. Short-read polishing tools can use short reads to fix these errors, but most rely on short-read alignment which is unreliable in repeat regions. Errors in such regions are therefore challenging to fix and often remain after short-read polishing. Here we introduce Polypolish, a new short-read polisher which uses all-per-read alignments to repair errors in repeat sequences that other polishers cannot. Polypolish performed well in benchmarking tests using both simulated and real reads, and it almost never introduced errors during polishing. The best results were achieved by using Polypolish in combination with other short-read polishers. Author summary Recent improvements in Oxford Nanopore Technologies sequencing platforms and assembly algorithms have made it easier than ever to generate complete bacterial genome sequences. However, Oxford Nanopore genome sequences suffer from errors that limit their utility in downstream analyses. To fix these errors, one can ‘polish’ the genome with Illumina sequencing, exploiting the fact that Oxford Nanopore and Illumina sequencing have different error profiles. There are several polishing tools which can fix most errors in an Oxford Nanopore genome, but they struggle with errors in repetitive regions of the genome. With this in mind, we have developed a polisher, Polypolish, which uses a novel approach that allows it to fix more errors in genomic repeats. Our results show that Polypolish is both effective at repairing sequence errors and very unlikely to introduce new errors. Polypolish can often fix errors that other polishers cannot and vice versa, so the best results come from using a combination of tools. Polypolish therefore has an important role in bacterial genome assembly methods that aim for the highest possible sequence accuracy.

NeuralPolish: a Novel Nanopore Polishing Method Based on Alignment Matrix Construction and Orthogonal Bi-GRU Networks.

BlockPolish: Accurate Polishing of Long-Read Assembly Via Block Divide-and-conquer

NextPolish: a fast and efficient genome polishing tool for long-read assembly

MultiNanopolish: Refined Grouping Method for Reducing Redundant Calculations in Nanopolish.

The genome polishing tool POLCA makes fast and accurate corrections in genome assemblies

Highly accurate assembly polishing with DeepPolisher

NextPolish2: A Repeat-aware Polishing Tool for Genomes Assembled Using HiFi Long Reads

Polypolish: short-read polishing of long-read bacterial genome assemblies

MetaCONNET: A Metagenomic Polishing Tool for Long-Read Assemblies

Homopolish: a method for the removal of systematic errors in nanopore sequencing by homologous polishing

GoldPolish-Target: Targeted long-read genome assembly polishing

Comparative evaluation of Nanopore polishing tools for microbial genome assembly and polishing strategies for downstream analysis

De Novo Nanopore Read Quality Improvement Using Deep Learning

MiniScrub: de novo long read scrubbing using approximate alignment and deep learning

How low can you go? Short-read polishing of Oxford Nanopore bacterial genome assemblies

Performance of neural network basecalling tools for Oxford Nanopore sequencing

Constructing telomere-to-telomere diploid genome by polishing haploid nanopore-based assembly

Benchmarking short and long read polishing tools for nanopore assemblies: achieving near-perfect genomes for outbreak isolates

An Iterative Approach to Polish the Nanopore Sequencing Basecalling for Therapeutic RNA Quality Control

Fast and Accurate Assembly of Nanopore Reads Via Progressive Error Correction and Adaptive Read Selection