Abstract:Background: Amplicon sequencing (metabarcoding) is a common method to survey diversity of environmental communities whereby a single genetic locus is amplified and sequenced from the DNA of whole or partial organisms, organismal traces (e.g., skin, mucus, feces), or microbes in an environmental sample. Several software packages exist for analyzing amplicon data, among which QIIME 2 has emerged as a popular option because of its broad functionality, plugin architecture, provenance tracking, and interactive visualizations. However, each new analysis requires the user to keep track of input and output file names, parameters, and commands; this lack of automation and standardization is inefficient and creates barriers to meta-analysis and sharing of results. Findings: We developed Tourmaline, a Python-based workflow that implements QIIME 2 and is built using the Snakemake workflow management system. Starting from a configuration file that defines parameters and input files-a reference database, a sample metadata file, and a manifest or archive of FASTQ sequences-it uses QIIME 2 to run either the DADA2 or Deblur denoising algorithm; assigns taxonomy to the resulting representative sequences; performs analyses of taxonomic, alpha, and beta diversity; and generates an HTML report summarizing and linking to the output files. Features include support for multiple cores, automatic determination of trimming parameters using quality scores, representative sequence filtering (taxonomy, length, abundance, prevalence, or ID), support for multiple taxonomic classification and sequence alignment methods, outlier detection, and automated initialization of a new analysis using previous settings. The workflow runs natively on Linux and macOS or via a Docker container. We ran Tourmaline on a 16S ribosomal RNA amplicon data set from Lake Erie surface water, showing its utility for parameter optimization and the ability to easily view interactive visualizations through the HTML report, QIIME 2 viewer, and R- and Python-based Jupyter notebooks. Conclusion: Automated workflows like Tourmaline enable rapid analysis of environmental amplicon data, decreasing the time from data generation to actionable results. Tourmaline is available for download at github.com/aomlomics/tourmaline.

Dadasnake, a Snakemake implementation of DADA2 to process amplicon sequencing data for microbial ecology

Snaq: A Dynamic Snakemake Pipeline for Microbiome Data Analysis With QIIME2

RiboSnake - a user-friendly, robust, reproducible, multipurpose and documentation-extensive pipeline for 16S rRNA gene microbiome analysis

Tourmaline: A containerized workflow for rapid and iterable amplicon sequence analysis using QIIME 2 and Snakemake

DNAscan2: a versatile, scalable, and user-friendly analysis pipeline for human next-generation sequencing data

hppRNA-a Snakemake-based handy parameter-free pipeline for RNA-Seq analysis of numerous samples

Sustainable data analysis with Snakemake

Accelerating Single-Cell Sequencing Data Analysis with SciDAP: A User-Friendly Approach

MEDUSA: A Pipeline for Sensitive Taxonomic Classification and Flexible Functional Annotation of Metagenomic Shotgun Sequences

A snakemake toolkit for the batch assembly, annotation, and phylogenetic analysis of mitochondrial genomes and ribosomal genes from genome skims of museum collections

SimpleMetaPipeline: Breaking the bioinformatics bottleneck in metabarcoding

diverse-seq: an application for alignment-free selecting and clustering biological sequences

An open-sourced bioinformatic pipeline for the processing of Next-Generation Sequencing derived nucleotide reads: Identification and authentication of ancient metagenomic DNA

Pipeasm: a tool for automated large chromosome-scale genome assembly and evaluation

Fast and Simple Analysis of MiSeq Amplicon Sequencing Data with MetaAmp.

A Snakemake Toolkit for the Batch Assembly, Annotation and Phylogenetic Analysis of Mitochondrial Genomes and Ribosomal Genes From Genome Skims of Museum Collections

WASP: a versatile, web-accessible single cell RNA-Seq processing platform

SnakeLines: integrated set of computational pipelines for sequencing reads

Mapache: a flexible pipeline to map ancient DNA

Spacemake: processing and analysis of large-scale spatial transcriptomics data

Hecatomb: an integrated software platform for viral metagenomics