SimpleMetaPipeline: Breaking the bioinformatics bottleneck in metabarcoding
Jake Williams,Nathalie Pettorelli,Rosalie Dowell,Kenneth Macdonald,Christopher Meyer,Margaux Steyaert,Sarah Tweedt,Emma Ransome
DOI: https://doi.org/10.1111/2041-210x.14434
2024-10-08
Methods in Ecology and Evolution
Abstract:The democratisation of next‐generation sequencing has vastly increased the availability of sequencing data from metabarcoding. However, to effectively prepare these metabarcoding data for subsequent analysis, researchers must consistently apply several different bioinformatic tools—including those which denoise reads, cluster sequences and assign taxonomic identities. This often creates a bioinformatics bottleneck in workflows for non‐specialists due to obstacles around: (a) integrating different tools, (b) the inability to easily modify and rerun bioinformatic pipelines involving non‐scripted ('point‐and‐click') elements and (c) the multiple outputs that may be required of a single dataset (e.g. amplicon sequence variants [ASVs] and operational taxonomic units [OTUs]), which often results in users running pipelines multiple times. Here, we introduce SimpleMetaPipeline, an open‐source bioinformatics pipeline implemented in R, which addresses these obstacles. SimpleMetaPipeline integrates the most robust and commonly used existing bioinformatic tools in a single reproducible pipeline, with a streamlined choice of parameters, to generate a sequence data table containing alternative clustering and assignment options. SimpleMetaPipeline accepts demultiplexed paired‐end and single reads from multiple sequencing runs. We describe the pipeline and demonstrate how alternative annotations enable the easy implementation of multi‐algorithm agreement tests to strengthen inferences. SimpleMetaPipeline represents a valuable addition to the existing library of pipelines, providing easy and reproducible bioinformatics, including a range of commonly desired clustering and assignment options, such as OTUs and ASVs.
ecology