A self-aware data compression system on FPGA in Hadoop

Yubin Li,Yuliang Sun,Guohao Dai,Yuzhi Wang,Jiacai Ni,Yu Wang,Guoliang Li,Huazhong Yang
DOI: https://doi.org/10.1109/FPT.2015.7393149
2015-01-01
Abstract:With the exponential growth of data size, data storage and analysis have been exposed to more challenges due to the lack of disk capacity and the limited network bandwidth. Data compression technique provides a good solution to mitigate these effects. In this paper, we propose a self-aware data compression system on FPGA for typical data warehousing, such as Hive, with column stored data and multi-threading requirements. The hardware accelerators can change the degree and hierarchy of parallelism depending on the data to be compressed (during the runtime). We test the system performance on a Xilinx VC707 FPGA board and the experimental results show that, up to 16 3-parallelism accelerators can be implemented and the throughput could be improved up to 432 MB/s. It is 6.25X speedup compared with the software solution under the same number of threads.
What problem does this paper attempt to address?