To assemble or not to assemble: metagenomic profiling of microbially mediated biogeochemical pathways in complex communities

Jiayin Zhou,Wen Song,Qichao Tu
DOI: https://doi.org/10.1093/bib/bbac594
IF: 9.5
2022-12-29
Briefings in Bioinformatics
Abstract:High-throughput profiling of microbial functional traits involved in various biogeochemical cycling pathways using shotgun metagenomic sequencing has been routinely applied in microbial ecology and environmental science. Multiple bioinformatics data processing approaches are available, including assembly-based (single-sample assembly and multi-sample assembly) and read-based (merged reads and raw data). However, it remains not clear how these different approaches may differ in data analyses and affect result interpretation. In this study, using two typical shotgun metagenome datasets recovered from geographically distant coastal sediments, the performance of different data processing approaches was comparatively investigated from both technical and biological/ecological perspectives. Microbially mediated biogeochemical cycling pathways, including nitrogen cycling, sulfur cycling and B 12 biosynthesis, were analyzed. As a result, multi-sample assembly provided the most amount of usable information for targeted functional traits, at a high cost of computational resources and running time. Single-sample assembly and read-based analysis were comparable in obtaining usable information, but the former was much more time- and resource-consuming. Critically, different approaches introduced much stronger variations in microbial profiles than biological differences. However, community-level differences between the two sampling sites could be consistently observed despite the approaches being used. In choosing an appropriate approach, researchers shall balance the trade-offs between multiple factors, including the scientific question, the amount of usable information, computational resources and time cost. This study is expected to provide valuable technical insights and guidelines for the various approaches used for metagenomic data analysis.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?