SGC: Similarity-Guided Gradient Compression for Distributed Deep Learning

Jingling Liu,Jiawei Huang,Yijun Li,Zhaoyi Li,Wenjun Lyu,Wenchao Jiang,Jianxin Wang
DOI: https://doi.org/10.1109/iwqos61813.2024.10682863
2024-01-01
Abstract:The collective communication has become the bottleneck of large-scale distributed deep learning due to the huge volume of gradients aggregated during the training process. Despite much recent progress in reducing traffic volume by compressing the stochastic gradients inside each training worker, how to share the inter-worker data redundancy to alleviate communication overhead has remained elusive. In this paper, we reveal that most gradients have a great similarity with close value among training workers. From this hypothesis, we propose a Similarity-guided Gradient Compression framework named SGC which skips aggregating the similar gradients among each worker which utilizes local one rather than average value to save communication expenses. Each worker utilizes local SGC firstly quantifies the similarity of gradients among workers, and then elaborately adjusts the aggregation frequency of similar gradients without hurting DNN model accuracy. Meanwhile, we theoretically analyze the convergency accuracy of SGC. The comprehensive evaluation demonstrates that SGC outperforms the state-of-the-art schemes by up to 47% in convergence time.
What problem does this paper attempt to address?