Abstract:Abstract Background Analyses that use genome assemblies are critically affected by the contiguity, completeness, and accuracy of those assemblies. In recent years single-molecule sequencing techniques generating long-read information have become available and enabled substantial improvement in contig length and genome completeness, especially for large genomes (>100 Mb), although bioinformatic tools for these applications are still limited. Findings We developed a software tool to close sequence gaps in genome assemblies, TGS-GapCloser, that uses low-depth (∼10×) long single-molecule reads. The algorithm extracts reads that bridge gap regions between 2 contigs within a scaffold, error corrects only the candidate reads, and assigns the best sequence data to each gap. As a demonstration, we used TGS-GapCloser to improve the scaftig NG50 value of 3 human genome assemblies by 24-fold on average with only ∼10× coverage of Oxford Nanopore or Pacific Biosciences reads, covering with sequence data up to 94.8% gaps with 97.7% positive predictive value. These improved assemblies achieve 99.998% (Q46) single-base accuracy with final inserted sequences having 99.97% (Q35) accuracy, despite the high raw error rate of single-molecule reads, enabling high-quality downstream analyses, including up to a 31-fold increase in the scaftig NGA50 and up to 13.1% more complete BUSCO genes. Additionally, we show that even in ultra-large genome assemblies, such as the ginkgo (∼12 Gb), TGS-GapCloser can cover 71.6% of gaps with sequence data. Conclusions TGS-GapCloser can close gaps in large genome assemblies using raw long reads quickly and cost-effectively. The final assemblies generated by TGS-GapCloser have improved contiguity and completeness while maintaining high accuracy. The software is available at https://github.com/BGI-Qingdao/TGS-GapCloser.

Telomere-to-telomere assembly by preserving contained reads

Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph

Pre-Assembly NGS Correction of ONT Reads Achieves HiFi-Level Assembly Quality

Association of SCN1A, SCN2A and ABCC2 gene polymorphisms with the response to antiepileptic drugs in Chinese Han patients with epilepsy.

TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads

Constructing telomere-to-telomere diploid genome by polishing haploid nanopore-based assembly

Telomere-to-Telomere Phased Genome Assembly Using HERRO-Corrected Simplex Nanopore Reads

Gapless assembly of complete human and plant chromosomes using only nanopore sequencing

Telomere-to-telomere assembly of diploid chromosomes with Verkko

GapPredict: A Language Model for Resolving Gaps in Draft Genome Assemblies

Design of PCI 2.2 Target Controller to Support Prefetch Request

GapReduce: A Gap Filling Algorithm Based on Partitioned Read Sets

quarTeT: a telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification

Transcriptome assembly from long-read RNA-seq alignments with StringTie2

Faucet: streaming de novo assembly graph construction

Assembly of chromosome-scale contigs by efficiently resolving repetitive sequences with long reads

Genome assembly in the telomere-to-telomere era

Training physicians to be administrators.

Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly

Assembly of repetitive regions using next-generation sequencing data

RResolver: efficient short-read repeat resolution within ABySS