pipemake: A pipeline creation tool using Snakemake for reproducible analysis of biological datasets

Andrew E Webb,Scott W Wolf,Ian M Traniello,Sarah D Kocher
DOI: https://doi.org/10.1101/2024.12.20.629758
2024-12-24
Abstract:The exponential growth in biological data generation has created an urgent need for efficient, reproducible computational analysis workflows. Here, we present pipemake, a computational platform designed to streamline the development and implementation of efficient and reproducible Snakemake workflows. pipemake creates modular pipelines that can be seamlessly integrated or removed from the platform without requiring reconfiguration of the core system, enabling flexible adaptation of workflows to different analytical needs across diverse fields. To demonstrate the platform's capabilities, we created and implemented pipelines to reanalyze two distinct biological datasets. First, we recreated a population genomics analysis of the socially flexible halictid bee, , using pipemake-generated workflows for genome annotation, processing of variant data, dimensionality reduction, and a genome-wide association study (GWAS). We then used pipemake to analyze behavioral tracking data from the common eastern bumble bee, . In both cases, pipemake workflows produced results consistent with published findings while substantially reducing hands-on analysis time. Overall, pipemake's modular design allows researchers to easily modify existing pipelines or develop new ones without software development expertise. Beyond streamlining workflow creation, pipemake leverages the full Snakemake ecosystem to enable parallel processing, automated error recovery, and comprehensive analysis documentation. These features make pipemake an efficient and accessible solution for analyzing complex biological datasets. pipemake is freely available as a conda package or direct download at https://github.com/kocherlab/pipemake.
Bioinformatics
What problem does this paper attempt to address?