gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters
Jiajun Huang,S. Di,Xiaodong Yu,Yujia Zhai,Jinyang Liu,Yafan Huang,Kenneth Raffenetti,Hui Zhou,Kai Zhao,Zizhong Chen,F. Cappello,Yan-Hua Guo,R. Thakur
DOI: https://doi.org/10.1145/3650200.3656636
2023-08-09
Abstract:GPU-aware collective communication has become a major bottleneck for modern computing platforms as GPU computing power rapidly rises. A traditional approach is to directly integrate lossy compression into GPU-aware collectives, which can lead to serious performance issues such as underutilized GPU devices and uncontrolled data distortion. In order to address these issues, in this paper, we propose gZCCL, a first-ever general framework that designs and optimizes GPU-aware, compression-enabled collectives with an accuracy-aware design to control error propagation. To validate our framework, we evaluate the performance on up to 512 NVIDIA A100 GPUs with real-world applications and datasets. Experimental results demonstrate that our gZCCL-accelerated collectives, including both collective computation (Allreduce) and collective data movement (Scatter), can outperform NCCL as well as Cray MPI by up to 4.5 × and 28.7 ×, respectively. Furthermore, our accuracy evaluation with an image-stacking application confirms the high reconstructed data quality of our accuracy-aware framework.
Computer Science