Finding the right fit: A comprehensive evaluation of short-read and long-read sequencing approaches to maximize the utility of clinical microbiome data
Jeanette L. Gehrig,Daniel M. Portik,Mark D. Driscoll,Eric Jackson,Shreyasee Chakraborty,Dawn Gratalo,Meredith Ashby,Ricardo Valladares
DOI: https://doi.org/10.1101/2021.08.31.458285
2021-09-01
Abstract:ABSTRACT A longstanding challenge in human microbiome research is achieving the taxonomic and functional resolution needed to generate testable hypotheses about the gut microbiome’s impact on health and disease. More recently, this challenge has extended to a need for in-depth understanding of the pharmacokinetics and pharmacodynamics of clinical microbiome-based interventions. Whole genome metagenomic sequencing provides high taxonomic resolution and information on metagenome functional capacity, but the required deep sequencing is costly. For this reason, short-read sequencing of the bacterial 16S ribosomal RNA (rRNA) gene is the standard for microbiota profiling, despite its poor taxonomic resolution. The recent falling costs and improved fidelity of long-read sequencing warrant an evaluation of this approach for clinical microbiome analysis. We used samples from participants enrolled in a Phase 1b clinical trial of a novel live biotherapeutic product to perform a comparative analysis of short-read and long-read amplicon and metagenomic sequencing approaches to assess their value for generating informative and actionable clinical microbiome data. Comparison of ubiquitous short-read 16S rRNA amplicon profiling to long-read profiling of the 16S-ITS-23S rRNA amplicon showed that only the latter provided strain-level community resolution and insight into novel taxa. Across all methods, overall community taxonomic profiles were comparable and relationships between samples were conserved, highlighting the accuracy of modern microbiome analysis pipelines. All methods identified an active ingredient strain in treated study participants, though detection confidence was higher for long-read methods. Read coverage from both metagenomic methods provided evidence of active ingredient strain replication in some treated participants. Compared to short-read metagenomics, approximately twice the proportion of long reads were assigned functional annotations (63% vs. 34%). Finally, similar bacterial metagenome-assembled genomes (MAGs) were recovered across short-read and long-read metagenomic methods, although MAGs recovered from long reads were more complete. Overall, despite higher costs, long-read microbiome characterization provides added scientific value for clinical microbiome research in the form of higher taxonomic and functional resolution and improved recovery of microbial genomes compared to traditional short-read methodologies. Data Summary All supporting data, code and protocols have been provided within the article or as supplementary data files. Two supplementary figures and four supplementary tables are available with the online version of this article. Sequencing data are accessible in the National Center for Biotechnology Information (NCBI) database under BioProject accession number PRJNA754443. The R code and additional data files used for analysis and figure generation are accessible in a GitHub repository ( https://github.com/jeanette-gehrig/Gehrig_et_al_sequencing_comparison ). Impact Statement Accurate sequencing and analysis are essential for informative microbiome profiling, which is critical for the development of novel microbiome-targeted therapeutics. Recent improvements in long-read sequencing technology provide a promising, but more costly, alternative to ubiquitous short-read sequencing. To our knowledge, a direct comparison of the informational value of short-read and HiFi long-read sequencing approaches has not been reported for clinical microbiome samples. Using samples from participants in a Phase 1b trial of a live biotherapeutic product, we compare microbiome profiles generated from short-read and long-read sequencing for both amplicon-based 16S ribosomal RNA profiling and metagenomic sequencing. Though overall taxonomic profiles were similar across methods, only long-read amplicon sequencing provided strain-level resolution, and long-read metagenomic sequencing resulted in a significantly greater proportion of functionally annotated genes. Detection of a live biotherapeutic active ingredient strain in treated participants was achieved with all methods, and both metagenomic methods provided evidence of active replication of this strain in some participants. Similar taxonomies were recovered through metagenomic assemblies of short and long reads, although assemblies were more complete with long reads. Overall, we show the utility of long-read microbiome sequencing in direct comparison to commonly used short-read methods for clinically relevant microbiome profiling.