Simple, reference-independent analyses help optimize hybrid assembly of microbial community metagenomes

Garrett J. Smith,Theo van Alen,Maartje van Kessel,Sebastian Lücker
DOI: https://doi.org/10.1101/2023.09.12.557416
2024-02-08
Abstract:Hybrid metagenomic assembly, leveraging both long- and short-read sequencing technologies, of microbial communities is becoming an increasingly accessible approach, yet its widespread application faces several challenges. High-quality references may not be available for assembly accuracy comparisons common for benchmarking, and certain aspects of hybrid assembly may require dataset-dependent, empirically-guided optimization rather than application of a uniform approach. In this study, several simple, reference-free characteristics – gene lengths and read recruitment – were analyzed as reliable proxies of assembly quality to guide hybrid assembly optimization. These characteristics were further explored in relation to reference-dependent genome- and gene-centric analyses that are common for microbial community metagenomic studies. Here, two laboratory-scale bioreactors were sequenced with short and long read platforms, and assembled with commonly used software packages. Following long read assembly, long read correction and short read polishing were iterated to resolve errors. Each iteration in this process was shown so have a substantial effect on gene- and genome-centric community composition. Simple, reference-free assembly characteristics, specifically changes in gene fragmentation and short read recruitment, explored throughout this process replicated patterns of more advanced analyses seen in published comparative studies, and therefore are suitable proxies for hybrid metagenome assembly accuracy to save computational resources. Hybrid metagenomic sequencing approaches will likely remain relevant due to the low costs of short read sequencing, therefore it is imperative that users are equipped to estimate assembly accuracy prior to downstream gene- and genome-centric analyses.
Bioinformatics
What problem does this paper attempt to address?