CZ ID: a cloud-based, no-code platform enabling advanced long read metagenomic analysis

Sara E. Simmonds,Lynn Ly,John Beaulaurier,Ryan Lim,Todd Morse,Sri Gowtham Thakku,Karyna Rosario,Juan Caballero Perez,Andreas Puschnik,Lusajo Mwakibete,Scott Hickey,Cristina M. Tato,CZ ID Team,Katrina Kalantar
DOI: https://doi.org/10.1101/2024.02.29.579666
2024-03-02
Abstract:Metagenomics has enabled the rapid, unbiased detection of microbes across diverse sample types, leading to exciting discoveries in infectious disease, microbiome, and viral research. However, the analysis of metagenomic data is often complex and computationally resource-intensive. CZ ID is a free, cloud-based genomic analysis platform that enables researchers to detect microbes using metagenomic data, identify antimicrobial resistance genes, and generate viral consensus genomes. With CZ ID, researchers can upload raw sequencing data, find matches in NCBI databases, get per-sample taxon metrics, and perform a variety of analyses and data visualizations. The intuitive interface and interactive visualizations make exploring and interpreting results simple. Here, we describe the expansion of CZ ID with a new long read mNGS pipeline that accepts Oxford Nanopore generated data ( ). We report benchmarking of a standard mock microbial community dataset against Kraken2, a widely used tool for metagenomic analysis. We evaluated the ability of this new pipeline to detect divergent viruses using simulated datasets. We also assessed the detection limit of a spiked-in virus to a cell line as a proxy for clinical samples. Lastly, we detected known and novel viruses in previously characterized disease vector (mosquitoes) samples.
Bioinformatics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **Simplify and optimize long - read metagenomic data analysis, especially the data generated based on the Oxford Nanopore sequencing platform, in order to improve the accuracy and efficiency of microbial detection**. Specifically, the paper introduces a new module of the CZ ID (Chan Zuckerberg Initiative ID) platform - the mNGS Nanopore pipeline, which aims to solve the following problems: 1. **Complex and computationally - intensive metagenomic data analysis**: - Metagenomic data analysis usually requires professional bioinformatics skills, powerful computing resources, and a long running time. These factors limit its application in resource - limited environments. - By providing a cloud - based, no - programming - required platform, the CZ ID mNGS Nanopore pipeline enables researchers to process and analyze long - read data more easily without having a deep programming background or expensive computing equipment. 2. **Improve the detection ability of complex microbial communities**: - Long - read sequencing technologies such as Oxford Nanopore have higher accuracy, can better assemble complex metagenomes, and identify structural variations (such as insertions, deletions, and rearrangements), which are crucial for the study of pathogen virulence and antibiotic resistance. - The new pipeline improves the detection accuracy and sensitivity of various species in microbial communities through improved host - removal filtering, de novo assembly, and alignment steps. 3. **Detect new and highly variable viruses**: - The paper evaluates the ability of the new pipeline to detect highly variable viruses on simulated data sets and tests its effect on detecting known and new viruses in actual samples (such as cell lines and mosquito samples). - These functions are of great significance for quickly identifying and characterizing emerging pathogens, especially during an epidemic outbreak. In summary, by introducing the CZ ID mNGS Nanopore pipeline, this paper aims to provide researchers with a powerful and easy - to - use tool to meet the challenges in metagenomic data analysis and promote the study of microbial communities, especially viruses.