Improving I/O Throughput of Scientific Applications Using Transparent Parallel Compression

Tekin Bicer,Jian Yin,Gagan Agrawal
DOI: https://doi.org/10.1109/CCGrid.2014.112
2014-01-01
Abstract:Increasing number of cores in parallel computer systems are allowing scientific simulations to be executed with increasing spatial and temporal granularity. However, this also implies that increasing larger-sized datasets need to be output, stored, managed, and then visualized and/or analyzed using a variety of methods. In examining the possibility of using compression to accelerate all of these steps, we focus on two important questions: "Can compression help save time when data is output from, or input into, a parallel program?", and "How can a scientist's effort in using compression with a parallel program be minimized?". We focus on Pnet CDF, and show how transparent compression can be supported, thus allowing an existing simulation program to start outputting and storing data in a compressed fashion, and similarly, allow a data analysis application to read compressed data. We address challenges in supporting compression when parallel writes are being performed. In our experiments, we first analyze the effects of using compression with micro benchmarks, and then, continue our evaluation using a scientific simulation application, and two data analysis applications. While we obtain up to a factor of 2 improvement in performance for micro benchmarks, the execution time of simulation application is improved up to 22%, and the maximum speedup of data analysis applications is 1.83(with an average speedup of 1.36).
What problem does this paper attempt to address?