Abstract:High-performance computing is progressively assuming a fundamental role in advancing scientific research and engineering domains. However, the ever-expanding scales of scientific simulations pose challenges for efficient data I/O and storage. The data compression technology has garnered significant attention as a solution to reduce data transmission and storage costs while enhancing performance. In particular, the BZIP2 lossless compression algorithm has been widely used due to its exceptional compression ratio, moderate compression speed, high reliability, and open-source nature. This paper focuses on the design and realization of a parallelized BZIP2 algorithm tailored for deployment on the New-Generation Sunway supercomputing platform. By leveraging the unique cache patterns of the New-Generation Sunway processor, we propose the highly tuned multi-threading and multi-node implementations of the BZIP2 applications for different scenarios. Moreover, we also propose the efficient BZIP2 libraries based on the management processing element and computing processing element which support the commonly used high-level (de)compression interfaces. The test results indicate that the our multi-threading implementation achieves maximum speedup of 23.09x$$ \times $$ (8.57x$$ \times $$) in decompression(compression) compared to the sequential implementation. Furthermore, the multi-node implementation achieves 50.81% (26.35%) parallel efficiency and peak performance of 16.6 GB/s (52.8 GB/s) for compression(decompression) when scaling up to 2048 processes. This paper focuses on the design and realization of a parallelized BZIP2 algorithm tailored for deployment on the New-Generation Sunway supercomputing platform. By leveraging the unique cache patterns of the New-Generation Sunway processor, we propose the highly tuned multi-threading and multi-node implementations of the BZIP2 applications for different scenarios. Moreover, we also propose the efficient BZIP2 libraries based on the management processing element and computing processing element which support the commonly used high-level (de)compression interfaces.image

A Versatile Compression Method for Floating-Point Data Stream

Data Compression and Storage under High Speed Network

Use cases of lossy compression for floating-point data in scientific data sets

A High Performance Compression Method For Climate

FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-Point Data

A General SIMD-Based Approach to Accelerating Compression Algorithms.

A General Framework for Progressive Data Compression and Retrieval

Scalable Hybrid Learning Techniques for Scientific Data Compression

Lossless preprocessing of floating point data to enhance compression

cuSZ-$i$: High-Ratio Scientific Lossy Compression on GPUs with Optimized Multi-Level Interpolation

Czip: A Fast Lossless Compression Algorithm for Climate Data

A self-aware data compression system on FPGA in Hadoop

Change a Bit to save Bytes: Compression for Floating Point Time-Series Data

FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs

Adaptive Encoding Strategies for Erasing-Based Lossless Floating-Point Compression

MetaZip

Author Response for "refactoring BZIP2 on the New-Generation Sunway Supercomputer"

High-Ratio Lossy Compression: Exploring the Autoencoder to Compress Scientific Data

Parallel Data Compression Techniques

ZFP: A compressed array representation for numerical computations

High-performance Effective Scientific Error-bounded Lossy Compression with Auto-tuned Multi-component Interpolation