How do users design scientific workflows? The Case of Snakemake

Sebastian Pohl,Nourhan Elfaramawy,Kedi Cao,Birte Kehr,Matthias Weidlich
2023-09-25
Abstract:Scientific workflows automate the analysis of large-scale scientific data, fostering the reuse of data processing operators as well as the reproducibility and traceability of analysis results. In exploratory research, however, workflows are continuously adapted, utilizing a wide range of tools and software libraries, to test scientific hypotheses. Script-based workflow engines cater to the required flexibility through direct integration of programming primitives but lack abstractions for interactive exploration of the workflow design by a user during workflow execution. To derive requirements for such interactive workflows, we conduct an empirical study on the use of Snakemake, a popular Python-based workflow engine. Based on workflows collected from 1602 GitHub repositories, we present insights on common structures of Snakemake workflows, as well as the language features typically adopted in their specification.
Other Computer Science
What problem does this paper attempt to address?
This paper explores how to design scientific workflows, specifically focusing on Snakemake, a Python-based workflow engine. It discusses the adaptability and flexibility of workflows in exploratory research, as well as the limitations of user interaction during the execution process. By analyzing Snakemake workflows from 1602 GitHub repositories, the paper studies the structure and language features of workflows to understand common patterns and specifications in workflow design. The aim of the paper is to provide requirements for interactive workflow systems and fill the gap in empirical research on script-based workflow structure and language features.