Abstract:Background Because of the short read length of high throughput sequencing data, assembly errors are introduced in genome assembly, which may have adverse impact to the downstream data analysis. Several tools have been developed to eliminate these errors by either 1) comparing the assembled sequences with some similar reference genome, or 2) analyzing paired-end reads aligned to the assembled sequences and determining inconsistent features alone mis-assembled sequences. However, the former approach cannot distinguish real structural variations between the target genome and the reference genome while the latter approach could have many false positive detections (correctly assembled sequence being considered as mis-assembled sequence). Results We present misFinder, a tool that aims to identify the assembly errors with high accuracy in an unbiased way and correct these errors at their mis-assembled positions to improve the assembly accuracy for downstream analysis. It combines the information of reference (or close related reference) genome and aligned paired-end reads to the assembled sequence. Assembly errors and correct assemblies corresponding to structural variations can be detected by comparing the genome reference and assembled sequence. Different types of assembly errors can then be distinguished from the mis-assembled sequence by analyzing the aligned paired-end reads using multiple features derived from coverage and consistence of insert distance to obtain high confident error calls. Conclusions We tested the performance of misFinder on both simulated and real paired-end reads data, and misFinder gave accurate error calls with only very few miscalls. And, we further compared misFinder with QUAST and REAPR. misFinder outperformed QUAST and REAPR by 1) identified more true positive mis-assemblies with very few false positives and false negatives, and 2) distinguished the correct assemblies corresponding to structural variations from mis-assembled sequence. misFinder can be freely downloaded from https://github.com/hitbio/misFinder .

MEC: Misassembly Error Correction in Contigs Using a Combination of Paired-End Reads and GC-contents

MEC: Misassembly Error Correction in Contigs Based on Distribution of Paired-End Reads and Statistics of GC-contents

PECC: Correcting Contigs Based on Paired-End Read Distribution

Metamic: Reference-Free Misassembly Identification and Correction of De Novo Metagenomic Assemblies

misFinder: identify mis-assemblies in an unbiased manner using reference and paired-end reads

MECAT: an ultra-fast mapping, error correction and<i>de novo</i>assembly tool for single-molecule sequencing reads

MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads

MECAT: an ultra-fast mapping, error correction and de novo assembly tool for single-molecule sequencing reads

MAC: Merging Assemblies by Using Adjacency Algebraic Model and Classification

MAECI: A Pipeline for Generating Consensus Sequence with Nanopore Sequencing Long-read Assembly and Error Correction

Evaluation of the impact of Illumina error correction tools on de novo genome assembly

EPGA-SC : A Framework for <italic>de novo</italic> Assembly of Single-Cell Sequencing Reads

MapReduce for Accurate Error Correction of Next-Generation Sequencing Data

Fec: a Fast Error Correction Method Based on Two-Rounds Overlapping and Caching.

Efficient assembly of nanopore reads via highly accurate and intact error correction

A De Novo Assembly Method for Metagenomic DNA Reads with Mate Pairs

Minimum error correction-based haplotype assembly: Considerations for long read data

Improving de novo assembly based on read classification.

Assessment of Metagenomic Assemblers Based on Hybrid Reads of Real and Simulated Metagenomic Sequences

Fast and Accurate Assembly of Nanopore Reads Via Progressive Error Correction and Adaptive Read Selection

Error filtering, pair assembly and error correction for next-generation sequencing reads