PipeVal: light-weight extensible tool for file validation

Yash Patel,Arpi Beshlikyan,Madison Jordan,Gina Kim,Aaron Holmes,Takafumi N Yamaguchi,Paul C Boutros
DOI: https://doi.org/10.1093/bioinformatics/btae079
IF: 5.8
2024-02-01
Bioinformatics
Abstract:Abstract Motivation The volume of biomedical data generated each year is growing exponentially as high-throughput molecular, imaging and mHealth technologies expand. This rise in data volume has contributed to an increasing reliance on and demand for computational methods, and consequently to increased attention to software quality and data integrity. Results To simplify data verification in diverse data-processing pipelines, we created PipeVal, a light-weight, easy-to-use, extensible tool for file validation. It is open-source, easy to integrate with complex workflows, and modularized for extensibility for new file formats. PipeVal can be rapidly inserted into existing methods and pipelines to automatically validate and verify inputs and outputs. This can reduce wasted compute time attributed to file corruption or invalid file paths, and significantly improve the quality of data-intensive software. Availability and implementation PipeVal is an open-source Python package under the GPLv2 license and it is freely available at https://github.com/uclahs-cds/package-PipeVal. The docker image is available at: https://github.com/uclahs-cds/package-PipeVal/pkgs/container/pipeval.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?