Abstract:Background. Metagenomics has transformed our understanding of microbial diversity across ecosystems, with recent advances enabling de novo assembly of genomes from metagenomes. These metagenome-assembled genomes are critical to provide ecological, evolutionary, and metabolic context for all the microbes and viruses yet to be cultivated. Metagenomes can now be generated from nanogram to subnanogram amounts of DNA. However, these libraries require several rounds of PCR amplification before sequencing, and recent data suggest these typically yield smaller and more fragmented assemblies than regular metagenomes. Methods. Here we evaluate de novo assembly methods of 169 PCR-amplified metagenomes, including 25 for which an unamplified counterpart is available, to optimize specific assembly approaches for PCR-amplified libraries. We first evaluated coverage bias by mapping reads from PCR-amplified metagenomes onto reference contigs obtained from unamplified metagenomes of the same samples. Then, we compared different assembly pipelines in terms of assembly size (number of bp in contigs ≥ 10kb) and error rates to evaluate which are the best suited for PCR-amplified metagenomes. Results. Read mapping analyses revealed that the depth of coverage within individual genomes is significantly more uneven in PCR-amplified datasets versus unamplified metagenomes, with regions of high depth of coverage enriched in short inserts. This enrichment scales with the number of PCR cycles performed, and is presumably due to preferential amplification of short inserts. Standard assembly pipelines are confounded by this type of coverage unevenness, so we evaluated other assembly options to mitigate these issues. We found that a pipeline combining read deduplication and an assembly algorithm originally designed to recover genomes from libraries generated after whole genome amplification (single-cell SPAdes) frequently improved assembly of contigs ≥ 10kb by 10 to 100-fold for low input metagenomes. Conclusions. PCR-amplified metagenomes have enabled scientists to explore communities traditionally challenging to describe, including some with extremely low biomass or from which DNA is particularly difficult to extract. Here we show that a modified assembly pipeline can lead to an improved de novo genome assembly from PCR-amplified datasets, and enables a better genome recovery from low input metagenomes.

Efficient De Novo Assembly and Recovery of Microbial Genomes from Complex Metagenomes Using a Reduced Set of k-mers

Integrated De Novo Gene Prediction and Peptide Assembly of Metagenomic Sequencing Data

Efficient High-Quality Metagenome Assembly from Long Accurate Reads using Minimizer-space de Bruijn Graphs

Simple, reference-independent analyses help optimize hybrid assembly of microbial community metagenomes

Optimizing De Novo Genome Assembly from PCR-amplified Metagenomes

Benchmarking de novo assembly methods on metagenomic sequencing data

Scaling metagenome sequence assembly with probabilistic de Bruijn graphs

Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy

Metagenomics-Toolkit: The Flexible and Efficient Cloud-Based Metagenomics Workflow featuring Machine Learning-Enabled Resource Allocation

Distilled Single Cell Genome Sequencing and De Novo Assembly for Sparse Microbial Communities

HyDA-Vista: towards optimal guided selection of k-mer size for sequence assembly

Metagenomic Assembly: Reconstructing Genomes from Metagenomes

Towards complete representation of bacterial contents in metagenomic samples

Utilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis

Extreme Scale De Novo Metagenome Assembly

Benchmarking genome assembly methods on metagenomic sequencing data

KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies

Assessment of Metagenomic Assemblers Based on Hybrid Reads of Real and Simulated Metagenomic Sequences

Multiple Comparative Metagenomics using Multiset k-mer Counting

PARMIK: PArtial Read Matching with Inexpensive K-mers

The complex task of choosing a de novo assembly: lessons from fungal genomes