Abstract:Background Usually, next generation sequencing (NGS) technology has the property of ultra-high throughput but the read length is remarkably short compared to conventional Sanger sequencing. Paired-end NGS could computationally extend the read length but with a lot of practical inconvenience because of the inherent gaps. Now that Illumina paired-end sequencing has the ability of read both ends from 600 bp or even 800 bp DNA fragments, how to fill in the gaps between paired ends to produce accurate long reads is intriguing but challenging. Results We have developed a new technology, referred to as pseudo-Sanger (PS) sequencing. It tries to fill in the gaps between paired ends and could generate near error-free sequences equivalent to the conventional Sanger reads in length but with the high throughput of the Next Generation Sequencing. The major novelty of PS method lies on that the gap filling is based on local assembly of paired-end reads which have overlaps with at either end. Thus, we are able to fill in the gaps in repetitive genomic region correctly. The PS sequencing starts with short reads from NGS platforms, using a series of paired-end libraries of stepwise decreasing insert sizes. A computational method is introduced to transform these special paired-end reads into long and near error-free PS sequences, which correspond in length to those with the largest insert sizes. The PS construction has 3 advantages over untransformed reads: gap filling, error correction and heterozygote tolerance. Among the many applications of the PS construction is de novo genome assembly, which we tested in this study. Assembly of PS reads from a non-isogenic strain of Drosophila melanogaster yields an N50 contig of 190 kb, a 5 fold improvement over the existing de novo assembly methods and a 3 fold advantage over the assembly of long reads from 454 sequencing. Conclusions Our method generated near error-free long reads from NGS paired-end sequencing. We demonstrated that de novo assembly could benefit a lot from these Sanger-like reads. Besides, the characteristic of the long reads could be applied to such applications as structural variations detection and metagenomics.

Error filtering, pair assembly and error correction for next-generation sequencing reads

Pseudo-Sanger Sequencing: Massively Parallel Production of Long and Near Error-Free Reads Using NGS Technology

Comprehensive assessment of error correction methods for high-throughput sequencing data

ReadsClean: a new approach to error correction of sequencing reads based on alignments clustering

Improving transcriptome assembly through error correction of high-throughput sequence reads

How Error Correction Affects PCR Deduplication: A Survey Based on UMI Datasets of Short Reads

Too many needles in this haystack: algorithms for the analysis of next generation sequence data

Turn ‘noise’ to signal: accurately rectify millions of erroneous short reads through graph learning on edit distances

DUDE-Seq: Fast, Flexible, and Robust Denoising for Targeted Amplicon Sequencing

Millisecond to microsecond time scale dynamics of the retinoid X and retinoic acid receptor DNA-binding domains and dimeric complex formation.

Efficient assembly of nanopore reads via highly accurate and intact error correction

Evaluation of the impact of Illumina error correction tools on de novo genome assembly

Training physicians to be administrators.

A read-filtering algorithm for high-throughput marker-gene studies that greatly improves OTU accuracy

Chasing Sequencing Perfection: Marching Toward Higher Accuracy and Lower Costs

NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads

Minimum error correction-based haplotype assembly: Considerations for long read data

DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies

Integrated Pop-Click Noise Suppression, EMI Reduction, and Short-Circuit Detection for Class-D Audio Amplifiers

Three Rounds of Read Correction Significantly Improve Eukaryotic Protein Detection in ONT Reads

An Approach to Correcting DNA Sequencing Error