Abstract:High-energy large-scale particle colliders generate data at extraordinary rates. Developing real-time high-throughput data compression algorithms to reduce data volume and meet the bandwidth requirement for storage has become increasingly critical. Deep learning is a promising technology that can address this challenging topic. At the newly constructed sPHENIX experiment at the Relativistic Heavy Ion Collider, a Time Projection Chamber (TPC) serves as the main tracking detector, which records three-dimensional particle trajectories in a volume of a gas-filled cylinder. In terms of occupancy, the resulting data flow can be very sparse reaching $10^{-3}$ for proton-proton collisions. Such sparsity presents a challenge to conventional learning-free lossy compression algorithms, such as SZ, ZFP, and MGARD. In contrast, emerging deep learning-based models, particularly those utilizing convolutional neural networks for compression, have outperformed these conventional methods in terms of compression ratios and reconstruction accuracy. However, research on the efficacy of these deep learning models in handling sparse datasets, like those produced in particle colliders, remains limited. Furthermore, most deep learning models do not adapt their processing speeds to data sparsity, which affects efficiency. To address this issue, we propose a novel approach for TPC data compression via key-point identification facilitated by sparse convolution. Our proposed algorithm, BCAE-VS, achieves a $75\%$ improvement in reconstruction accuracy with a $10\%$ increase in compression ratio over the previous state-of-the-art model. Additionally, BCAE-VS manages to achieve these results with a model size over two orders of magnitude smaller. Lastly, we have experimentally verified that as sparsity increases, so does the model's throughput.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the efficient compression and real - time processing of the vast amount of sparse data generated in high - energy physics experiments. Specifically, the paper focuses on the data recorded by the Time Projection Chamber (TPC) in high - energy particle colliders (such as the sPHENIX experiment at RHIC). These data have an extremely high generation rate (up to several petabytes per second) and are very sparse (for example, the occupancy in proton - proton collisions is about $10^{- 3}$). Traditional lossless compression algorithms (such as SZ, ZFP, and MGARD) perform poorly on such sparse data, while existing deep - learning models, although having better performance, fail to fully utilize the sparsity of the data to improve efficiency. To solve these problems, the paper proposes a new deep - learning - based compression algorithm BCAE - VS (Bicephalous Convolutional Autoencoder with Variable Sparsity), which achieves a higher compression ratio and better reconstruction accuracy through key - point identification and sparse convolution techniques. The following are the main contributions of the paper: 1. **Introduction of BCAE - VS**: This is an improved bicephalous convolutional autoencoder that can provide a variable compression ratio according to the occupancy of TPC data. By selectively down - sampling the signal instead of reducing the size of the input array, this model improves the compression efficiency and reconstruction accuracy. 2. **Improvement of reconstruction performance**: Compared with the state - of - the - art BCAE model, BCAE - VS improves the reconstruction accuracy by 75% and the average compression ratio by 10%. 3. **High - throughput achieved by using sparse convolution**: To address the computational inefficiency of traditional convolution on highly sparse data, BCAE - VS adopts sparse convolution. This method only processes relevant signals, significantly reducing the computational overhead of matrix multiplication involving all - zero operands. 4. **Adaptation to data sparsity**: BCAE - VS can not only adjust the compression ratio according to the sparsity of the data, but also significantly increase the throughput of the model as the sparsity increases. In conclusion, this paper aims to develop a compression algorithm that can effectively process sparse and high - speed - generated TPC data while maintaining high reconstruction accuracy, thereby meeting the needs of next - generation streaming data acquisition systems.

Variable Rate Neural Compression for Sparse Detector Data

Efficient Compression of Sparse Accelerator Data Using Implicit Neural Representations and Importance Sampling

Fast 2D Bicephalous Convolutional Autoencoder for Compressing 3D Time Projection Chamber Data

Efficient Data Compression for 3D Sparse TPC via Bicephalous Convolutional Autoencoder

Scalable Deep Convolutional Neural Networks for Sparse, Locally Dense Liquid Argon Time Projection Chamber Data

Scalable Hybrid Learning Techniques for Scientific Data Compression

On-Sensor Data Filtering using Neuromorphic Computing for High Energy Physics Experiments

NeurLZ: On Enhancing Lossy Compression Performance based on Error-Controlled Neural Learning for Scientific Data

SRN-SZ: Deep Leaning-Based Scientific Error-bounded Lossy Compression with Super-resolution Neural Networks

Polynomial data compression for large-scale physics experiments

Efficient Neural Network Compression Inspired by Compressive Sensing.

Scalable, Proposal-free Instance Segmentation Network for 3D Pixel Clustering and Particle Trajectory Reconstruction in Liquid Argon Time Projection Chambers

A reconfigurable neural network ASIC for detector front-end data compression at the HL-LHC

ECNR: Efficient Compressive Neural Representation of Time-Varying Volumetric Datasets

Exploring Structural Sparsity in Neural Image Compression

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks

A Lightweight Recurrent Learning Network for Sustainable Compressed Sensing

LCP: Enhancing Scientific Data Management with Lossy Compression for Particles

Minimizing Area and Energy of Deep Learning Hardware Design Using Collective Low Precision and Structured Compression

Layerwise Sparse Coding for Pruned Deep Neural Networks with Extreme Compression Ratio.

Sparse Tensor-based Point Cloud Attribute Compression