Variable Rate Neural Compression for Sparse Detector Data

Yi Huang,Yeonju Go,Jin Huang,Shuhang Li,Xihaier Luo,Thomas Marshall,Joseph Osborn,Christopher Pinkenburg,Yihui Ren,Evgeny Shulga,Shinjae Yoo,Byung-Jun Yoon
2024-11-19
Abstract:High-energy large-scale particle colliders generate data at extraordinary rates. Developing real-time high-throughput data compression algorithms to reduce data volume and meet the bandwidth requirement for storage has become increasingly critical. Deep learning is a promising technology that can address this challenging topic. At the newly constructed sPHENIX experiment at the Relativistic Heavy Ion Collider, a Time Projection Chamber (TPC) serves as the main tracking detector, which records three-dimensional particle trajectories in a volume of a gas-filled cylinder. In terms of occupancy, the resulting data flow can be very sparse reaching $10^{-3}$ for proton-proton collisions. Such sparsity presents a challenge to conventional learning-free lossy compression algorithms, such as SZ, ZFP, and MGARD. In contrast, emerging deep learning-based models, particularly those utilizing convolutional neural networks for compression, have outperformed these conventional methods in terms of compression ratios and reconstruction accuracy. However, research on the efficacy of these deep learning models in handling sparse datasets, like those produced in particle colliders, remains limited. Furthermore, most deep learning models do not adapt their processing speeds to data sparsity, which affects efficiency. To address this issue, we propose a novel approach for TPC data compression via key-point identification facilitated by sparse convolution. Our proposed algorithm, BCAE-VS, achieves a $75\%$ improvement in reconstruction accuracy with a $10\%$ increase in compression ratio over the previous state-of-the-art model. Additionally, BCAE-VS manages to achieve these results with a model size over two orders of magnitude smaller. Lastly, we have experimentally verified that as sparsity increases, so does the model's throughput.
Instrumentation and Detectors,Artificial Intelligence,High Energy Physics - Experiment,Nuclear Experiment
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the efficient compression and real - time processing of the vast amount of sparse data generated in high - energy physics experiments. Specifically, the paper focuses on the data recorded by the Time Projection Chamber (TPC) in high - energy particle colliders (such as the sPHENIX experiment at RHIC). These data have an extremely high generation rate (up to several petabytes per second) and are very sparse (for example, the occupancy in proton - proton collisions is about \(10^{- 3}\)). Traditional lossless compression algorithms (such as SZ, ZFP, and MGARD) perform poorly on such sparse data, while existing deep - learning models, although having better performance, fail to fully utilize the sparsity of the data to improve efficiency. To solve these problems, the paper proposes a new deep - learning - based compression algorithm BCAE - VS (Bicephalous Convolutional Autoencoder with Variable Sparsity), which achieves a higher compression ratio and better reconstruction accuracy through key - point identification and sparse convolution techniques. The following are the main contributions of the paper: 1. **Introduction of BCAE - VS**: This is an improved bicephalous convolutional autoencoder that can provide a variable compression ratio according to the occupancy of TPC data. By selectively down - sampling the signal instead of reducing the size of the input array, this model improves the compression efficiency and reconstruction accuracy. 2. **Improvement of reconstruction performance**: Compared with the state - of - the - art BCAE model, BCAE - VS improves the reconstruction accuracy by 75% and the average compression ratio by 10%. 3. **High - throughput achieved by using sparse convolution**: To address the computational inefficiency of traditional convolution on highly sparse data, BCAE - VS adopts sparse convolution. This method only processes relevant signals, significantly reducing the computational overhead of matrix multiplication involving all - zero operands. 4. **Adaptation to data sparsity**: BCAE - VS can not only adjust the compression ratio according to the sparsity of the data, but also significantly increase the throughput of the model as the sparsity increases. In conclusion, this paper aims to develop a compression algorithm that can effectively process sparse and high - speed - generated TPC data while maintaining high reconstruction accuracy, thereby meeting the needs of next - generation streaming data acquisition systems.