Communication-Efficient and Byzantine-Robust Distributed Stochastic Learning with Arbitrary Number of Corrupted Workers

Jian Xu,Xinyi Tong,Shao-Lun Huang
DOI: https://doi.org/10.1109/icc45855.2022.9838792
2022-01-01
Abstract:Distributed implementations of gradient-based algorithms have been essential for training large machine learning models on massive datasets. However, distributed learning algorithms are confronted with several challenges, including communication costs, straggler issues, and attacks from Byzantine adversaries. Existing works on attack-resilient distributed learning, e.g., the coordinate-wise median of gradients, usually neglect communication and/or straggler issues, and fail to defend against well-crafted attacks. Moreover, those methods are ineffective when more than half of workers are corrupted by a Byzantine adversary. To tackle those challenges simultaneously, we develop a robust gradient aggregation framework that is compatible with gradient compression and straggler mitigation techniques. Our proposed framework requires the parameter server to maintain an honest gradient as a reference at each iteration, thus can compute trust-score and similarity for each received gradient and tolerate arbitrary number of corrupted workers. We also provide convergence analysis of our method for non-convex optimization problems. Finally, experiments of image classification task on Fashion-MNIST dataset are conducted under various Byzantine attacks and gradient sparsification operations, and the numerical results demonstrate the effectiveness of our proposed strategy.
What problem does this paper attempt to address?