BeeZip: Towards an Organized and Scalable Architecture for Data Compression

Ruihao Gao,Zhichun Li,Guangming Tan,Xueqi Li
DOI: https://doi.org/10.1145/3620666.3651323
2024-01-01
Abstract:Data compression plays a critical role in operating systems and large-scale computing workloads. Its primary objective is to reduce network bandwidth consumption and memory/storage capacity utilization. Given the need to manipulate hash tables, and execute matching operations on extensive data volumes, data compression software has transformed into a resource-intensive CPU task. To tackle this challenge, numerous prior studies have introduced hardware acceleration methods. For example, they have utilized Content-Addressable Memory (CAM) for string matches, incorporated redundant historical copies for each matching component, and so on. While these methods amplify the compression throughput, they often compromise an essential aspect of compression performance: the compression ratio (C.R.). Moreover, hardware accelerators face significant resource costs, especially in memory, when dealing with new large sliding window algorithms. We introduce BeeZip, the first hardware acceleration system designed explicitly for compression with a large sliding window. BeeZip tackles the hardware-level challenge of optimizing both compression ratio and throughput. BeeZip offers architectural support for compression algorithms with the following distinctive attributes: 1) A two-stage compression algorithm adapted for accelerator parallelism, decoupling hash parallelism and match execution dependencies; 2) An organized hash hardware accelerator named BeeHash engine enhanced with dynamic scheduling, which orchestrates hash processes with structured parallelism; 3) A hardware accelerator named HiveMatch engine for the match process, which employs a new scalable parallelism approach and a heterogeneous scale processing unit design to reduce memory resource overhead. Experimental results show that on the Silesia dataset, BeeZip achieves an optimal throughput of 10.42GB/s (C.R. 2.96) and the best C.R. of 3.14 (throughput of 5.95GB/s). Under similar compression ratios, compared to single-threaded/36-threaded software implementations, BeeZip offers accelerator speedups of 23.2×/2.45×, respectively. Against all accelerators we know, BeeZip consistently demonstrates a superior compression ratio, improving by at least 9%.
What problem does this paper attempt to address?