NFTest: automated testing of Nextflow pipelines

Yash Patel,Chenghao Zhu,Takafumi N Yamaguchi,Yuan Zhe Bugh,Mao Tian,Aaron Holmes,Sorel T Fitz-Gibbon,Paul C Boutros
DOI: https://doi.org/10.1093/bioinformatics/btae081
IF: 5.8
2024-02-01
Bioinformatics
Abstract:Abstract Motivation The ongoing expansion in the volume of biomedical data has contributed to a growing complexity in the tools and technologies used in research with an increased reliance on complex workflows written in orchestration languages such as Nextflow to integrate algorithms into processing pipelines. The growing use of workflows involving various tools and algorithms has led to increased scrutiny of software development practices to avoid errors in individual tools and in the connections between them. Results To facilitate test-driven development of Nextflow pipelines, we created NFTest, a framework for automated pipeline testing and validation with customizability options for Nextflow features. It is open-source, easy to initialize and use, and customizable to allow for testing of complex workflows with test success configurable through a broad range of assertions. NFTest simplifies the testing burden on developers by automating tests once defined and providing a flexible interface for running tests to validate workflows. This reduces the barrier to rigorous biomedical workflow testing and paves the way toward reducing computational errors in biomedicine. Availability and implementation NFTest is an open-source Python framework under the GPLv2 license and is freely available at https://github.com/uclahs-cds/tool-NFTest. The call-sSNV Nextflow pipeline is available at: https://github.com/uclahs-cds/pipeline-call-sSNV.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?
The paper introduces NFTest, an automated testing framework specifically designed for testing and validating Nextflow bioinformatics workflows. With the increasing volume of biomedical data and the proliferation of complex tool technologies, workflow orchestration languages like Nextflow are widely used to integrate algorithmic processes. However, complex workflows introduce the risk of software development errors, especially at the points of component connectivity. NFTest aims to simplify the testing process by providing customizable Nextflow functional tests and validating pipeline results through a wide range of assertions. It is open source, easy to initialize and use, and allows for testing of complex workflows with configurable success criteria through multiple assertions. NFTest reduces the burden of manual testing for developers and improves the rigor of bioinformatics workflow testing, thereby helping to reduce computational errors. In the paper, the authors demonstrate the application of NFTest in two independent Nextflow workflows, including a sarek pipeline for detecting germline and somatic variants from next-generation sequencing data, and a new pipeline for integrating single-nucleotide variants (sSNV) detection results from four different tools. Through these examples, the authors demonstrate the ease and efficiency of NFTest in testing and developing complex workflows, while also indicating that NFTest has low resource requirements and does not significantly increase runtime or memory usage.