Federated Variance-Reduced Stochastic Gradient Descent With Robustness to Byzantine Attacks

Zhaoxian Wu,Qing Ling,Tianyi Chen,Georgios B. Giannakis
DOI: https://doi.org/10.1109/TSP.2020.3012952
IF: 4.875
2020-01-01
IEEE Transactions on Signal Processing
Abstract:This paper deals with distributed finite-sum optimization for learning over multiple workers in the presence of malicious Byzantine attacks. Most resilient approaches so far combine stochastic gradient descent (SGD) with different robust aggregation rules. However, the sizeable SGD-induced stochastic gradient noise challenges discerning malicious messages sent by the Byzantine attackers from noisy stochastic gradients sent by the 'honest' workers. This motivates reducing the variance of stochastic gradients as a means of robustifying SGD. To this end, a novel Byzantine attack resilient distributed (Byrd-) SAGA approach is introduced for federated learning tasks involving multiple workers. Rather than the mean employed by distributed SAGA, the novel Byrd-SAGA relies on the geometric median to aggregate the corrected stochastic gradients sent by the workers. When less than half of the workers are Byzantine attackers, Byrd-SAGA attains provably linear convergence to a neighborhood of the optimal solution, with the asymptotic learning error determined by the number of Byzantine workers. Numerical tests corroborate the robustness to various Byzantine attacks, as well as the merits of Byrd-SAGA over Byzantine attack resilient distributed SGD.
What problem does this paper attempt to address?