A fast machine learning dataloader for epigenetic tracks from BigWig files

Joren Sebastian Retel,Andreas Poehlmann,Josh Chiou,Andreas Steffen,Djork-Arné Clevert
DOI: https://doi.org/10.1093/bioinformatics/btad767
IF: 5.8
2024-01-01
Bioinformatics
Abstract:Abstract Summary We created bigwig-loader, a data-loader for epigenetic profiles from BigWig files that decompresses and processes information for multiple intervals from multiple BigWig files in parallel. This is an access pattern needed to create training batches for typical machine learning models on epigenetics data. Using a new codec, the decompression can be done on a graphical processing unit (GPU) making it fast enough to create the training batches during training, mitigating the need for saving preprocessed training examples to disk. Availability and implementation The bigwig-loader installation instructions and source code can be accessed at https://github.com/pfizer-opensource/bigwig-loader
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?