GPU-accelerated adaptive compression framework for genomics data

Guo GuiXin,Qiu Shuang,Ye ZhiQiang,Wang BingQiang,Fang Lin,Lu Mian,See Simon,Mao Rui
DOI: https://doi.org/10.1109/BigData.2013.6691572
2013-01-01
Abstract:Genomics data is being produced at an unprecedented rate, especially in the context of clinical applications and grand challenge questions. There are various types of data in genomics research, most of which are stored as plain text tables. A data compression framework tailored to this file type is introduced in this paper, featuring a combination of generic compression algorithms, GPU acceleration, and column-major storage. This approach is the first to achieve both compression and decompression rates of around 100MB/s on commodity hardware without compromising compression ratio. By selecting appropriate compression schemes for each column of data, this framework efficiently exploits data redundancy while remaining applicable to a wide range of formats. The GPU-accelerated implementation also properly exploits the parallelism of compression algorithms. Finally, this paper presents a novel first-order Markov model based transformation, with evidence that it is at least as effective as Burrows-Wheeler and Move-To-Front in some contexts.
What problem does this paper attempt to address?