Benefit of Compression in Hadoop: A Case Study of Improving IO Performance on Hadoop

Li-Hui Xiang,Li Miao,Da-Fang Zhang,Feng-Ping Chen
DOI: https://doi.org/10.2991/978-94-6239-148-2_87
2015-01-01
Abstract:With the improvement of calculation accuracy, the application is required to handle the increasing volume of data. Although Hadoop can deal with PB-level data, IO often becomes a bottleneck. Compression can reduce the size of the IO load, speed the data transferring on disk and network. In Hadoop, the benefits of using compression have not been completely exploited. We present a compression-using-policy to help Hadoop users to determine when, where and how to use compression. Based on the policy, performance of Hadoop applications using compression can be improved up to 65 %. We also propose an efficient way to monitor Hadoop cluster with Ganglia, which helps balance the cost and benefits of the compression policy.
What problem does this paper attempt to address?