LCP: Enhancing Scientific Data Management with Lossy Compression for Particles

Longtao Zhang,Ruoyu Li,Congrong Ren,Sheng Di,Jinyang Liu,Jiajun Huang,Robert Underwood,Pascal Grosset,Dingwen Tao,Xin Liang,Hanqi Guo,Franck Capello,Kai Zhao
2024-11-02
Abstract:Many scientific applications opt for particles instead of meshes as their basic primitives to model complex systems composed of billions of discrete entities. Such applications span a diverse array of scientific domains, including molecular dynamics, cosmology, computational fluid dynamics, and geology. The scale of the particles in those scientific applications increases substantially thanks to the ever-increasing computational power in high-performance computing (HPC) platforms. However, the actual gains from such increases are often undercut by obstacles in data management systems related to data storage, transfer, and processing. Lossy compression has been widely recognized as a promising solution to enhance scientific data management systems regarding such challenges, although most existing compression solutions are tailored for Cartesian grids and thus have sub-optimal results on discrete particle data. In this paper, we introduce LCP, an innovative lossy compressor designed for particle datasets, offering superior compression quality and higher speed than existing compression solutions. Specifically, our contribution is threefold. (1) We propose LCP-S, an error-bound aware block-wise spatial compressor to efficiently reduce particle data size. This approach is universally applicable to particle data across various domains. (2) We develop LCP, a hybrid compression solution for multi-frame particle data, featuring dynamic method selection and parameter optimization. (3) We evaluate our solution alongside eight state-of-the-art alternatives on eight real-world particle datasets from seven distinct domains. The results demonstrate that our solution achieves up to 104% improvement in compression ratios and up to 593% increase in speed compared to the second-best option, under the same error criteria.
Distributed, Parallel, and Cluster Computing,Databases
What problem does this paper attempt to address?
The paper attempts to address the challenges of scientific data management systems on high-performance computing (HPC) platforms due to the rapid increase in computational power. Specifically, as computational capabilities enhance, many scientific applications can handle larger-scale and higher-precision data, leading to a data explosion that exceeds the limitations of existing data management systems in terms of memory, storage, and I/O capacity. These issues are particularly evident in the management of particle data, as particle data is widely used in many scientific fields such as molecular dynamics, cosmology, computational fluid dynamics, and geology. Existing lossless compression methods have limited effectiveness for compressing particle data, while lossy compression can significantly reduce data volume. However, most existing lossy compression methods are designed for structured grids and perform poorly on particle data. Therefore, this paper proposes a new lossy compressor, LCP (Lossy Compressor for Particles), aimed at improving the compression ratio and speed for particle data while ensuring that the data quality meets predefined error bounds. The main contributions of LCP include: 1. **LCP-S**: A block-based lossy spatial compression algorithm that efficiently compresses particle data within user-defined error bounds. 2. **LCP**: A hybrid compression solution that combines spatial and temporal domain compression techniques, dynamically selecting compression methods and optimizing parameters to maximize compression effectiveness. 3. **Performance Evaluation**: Comparative experiments on eight real particle datasets from seven different fields against eight state-of-the-art compression methods, showing that LCP excels in compression ratio, speed, and data fidelity. In summary, this paper aims to address the challenges of storing, transmitting, and processing particle data in scientific data management by developing the efficient lossy compressor LCP.