Abstract:High-performance computing is progressively assuming a fundamental role in advancing scientific research and engineering domains. However, the ever-expanding scales of scientific simulations pose challenges for efficient data I/O and storage. The data compression technology has garnered significant attention as a solution to reduce data transmission and storage costs while enhancing performance. In particular, the BZIP2 lossless compression algorithm has been widely used due to its exceptional compression ratio, moderate compression speed, high reliability, and open-source nature. This paper focuses on the design and realization of a parallelized BZIP2 algorithm tailored for deployment on the New-Generation Sunway supercomputing platform. By leveraging the unique cache patterns of the New-Generation Sunway processor, we propose the highly tuned multi-threading and multi-node implementations of the BZIP2 applications for different scenarios. Moreover, we also propose the efficient BZIP2 libraries based on the management processing element and computing processing element which support the commonly used high-level (de)compression interfaces. The test results indicate that the our multi-threading implementation achieves maximum speedup of 23.09x$$ \times $$ (8.57x$$ \times $$) in decompression(compression) compared to the sequential implementation. Furthermore, the multi-node implementation achieves 50.81% (26.35%) parallel efficiency and peak performance of 16.6 GB/s (52.8 GB/s) for compression(decompression) when scaling up to 2048 processes. This paper focuses on the design and realization of a parallelized BZIP2 algorithm tailored for deployment on the New-Generation Sunway supercomputing platform. By leveraging the unique cache patterns of the New-Generation Sunway processor, we propose the highly tuned multi-threading and multi-node implementations of the BZIP2 applications for different scenarios. Moreover, we also propose the efficient BZIP2 libraries based on the management processing element and computing processing element which support the commonly used high-level (de)compression interfaces.image

IC-Data: Improving Compressed Data Processing in Hadoop.

Data Compression and Storage under High Speed Network

Reinforcement Learning Based Data Compression for Energy-Efficient Non-volatile Caches.

DataMPI: Extending MPI to Hadoop-Like Big Data Computing

A self-aware data compression system on FPGA in Hadoop

Content-Aware Partial Compression for Textual Big Data Analysis in Hadoop

An Optimized Iterative Semantic Compression Algorithm And Parallel Processing for Large Scale Data.

Parallel Data Compression Techniques

FASTA/Q Data Compressors for MapReduce-Hadoop Genomics:Space and Time Savings Made Easy -- Version 1

Compressed Data Direct Computing for Databases

CompressDB: Enabling Efficient Compressed Data Direct Processing for Various Databases

Overview of Caching Mechanisms to Improve Hadoop Performance

Towards Optimizing Storage Costs on the Cloud

Enabling Efficient Random Access to Hierarchically-Compressed Data

An Approach of Fast Data Manipulation in HDFS with Supplementary Mechanisms

Accelerating Parallel Write via Deeply Integrating Predictive Lossy Compression with HDF5

Reducing Head-of-Line Blocking on Network in Hadoop Clusters

Efficient Document Analytics on Compressed Data

Author Response for "refactoring BZIP2 on the New-Generation Sunway Supercomputer"

A Hierarchical Adaptive Spatio-Temporal Data Compression Scheme for Wireless Sensor Networks

POCLib: A High-Performance Framework for Enabling Near Orthogonal Processing on Compression