Pybedgraph: a Python Package for Fast Operations on 1D Genomic Signal Tracks

Henry B. Zhang,Minji Kim,Jeffrey H. Chuang,Yijun Ruan
DOI: https://doi.org/10.1093/bioinformatics/btaa061
IF: 5.8
2020-01-01
Bioinformatics
Abstract:Motivation: Modern genomic research is driven by next-generation sequencing experiments such as ChIP-seq and ChIA-PET that generate coverage files for transcription factor binding, as well as DHS and ATAC-seq that yield coverage files for chromatin accessibility. Such files are in a bedGraph text format or a bigWig binary format. Obtaining summary statistics in a given region is a fundamental task in analyzing protein binding intensity or chromatin accessibility. However, the existing Python package for operating on coverage files is not optimized for speed. Results: We developed pyBedGraph, a Python package to quickly obtain summary statistics for a given interval in a bedGraph or a bigWig file. When tested on 12 ChIP-seq, ATAC-seq, RNA-seq and ChIA-PET datasets, pyBedGraph is on average 260 times faster than the existing program pyBigWig. On average, pyBedGraph can look up the exact mean signal of 1 million regions in similar to 0.26s and can compute their approximate means in <0.12s on a conventional laptop.
What problem does this paper attempt to address?