Abstract:The hidden vulnerability of distributed learning systems against Byzantine attacks has been investigated by recent researches and, fortunately, some known defenses showed the ability to mitigate Byzantine attacks when a minority of workers are under adversarial control. Yet, our community still has very little knowledge on how to handle the situations when the proportion of malicious workers is 50% or more. Based on our preliminary study of this open challenge, we find there is more that can be done to restore Byzantine robustness in these more threatening situations, if we better utilize the auxiliary information inside the learning process. In this paper, we propose Justinian's GAAvernor (GAA), a Gradient Aggregation Agent which learns to be robust against Byzantine attacks via reinforcement learning techniques. Basically, GAA relies on utilizing the historical interactions with the workers as experience and a quasi-validation set, a small dataset that consists of less than 10 data samples from similar data domains, to generate reward signals for policy learning. As a complement to existing defenses, our proposed approach does not bound the expected number of malicious workers and is proved to be robust in more challenging scenarios. Through extensive evaluations on four benchmark systems and against various adversarial settings, our proposed defense shows desirable robustness as if the systems were under no attacks, even in some case when 90% Byzantine workers are controlled by the adversary. Meanwhile, our approach shows a similar level of time efficiency compared with the state-of-the-art defenses. Moreover, GAA provides highly interpretable traces of worker behavior as by-products for further mitigation usages like Byzantine worker detection and behavior pattern analysis.

Federated Variance-Reduced Stochastic Gradient Descent With Robustness to Byzantine Attacks

Resilient to byzantine attacks finite-sum optimization over networks

Asynchronous Byzantine-Robust Stochastic Aggregation with Variance Reduction for Distributed Learning

Byzantine-Robust Aggregation with Gradient Difference Compression and Stochastic Variance Reduction for Federated Learning

Byzantine-robust Variance-Reduced Federated Learning over Distributed Non-I.i.d. Data

WGM-dSAGA: Federated Learning Strategies with Byzantine Robustness Based on Weighted Geometric Median

Communication-Efficient and Byzantine-Robust Distributed Stochastic Learning with Arbitrary Number of Corrupted Workers

Efficient Byzantine-Resilient Stochastic Gradient Desce

Byzantine-Robust Distributed Learning with Compression.

Byzantine-robust decentralized stochastic optimization with stochastic gradient noise-independent learning error

Byzantine-Robust Loopless Stochastic Variance-Reduced Gradient

Justinian's GAAvernor: Robust Distributed Learning with Gradient Aggregation Agent.

Byzantine-Resilient Non-Convex Stochastic Gradient Descent

Byzantine-resilient Decentralized Stochastic Gradient Descent

High Dimensional Distributed Gradient Descent with Arbitrary Number of Byzantine Attackers

Byzantine-resilient Federated Learning With Adaptivity to Data Heterogeneity

Byzantine-Resilient Stochastic Gradient Descent for Distributed Learning: A Lipschitz-Inspired Coordinate-wise Median Approach

Robust Distributed Learning Against Both Distributional Shifts and Byzantine Attacks

Byzantine-Robust Stochastic Gradient Descent for Distributed Low-Rank Matrix Completion

Resilient Two-Time-Scale Local Stochastic Gradient Descent for Byzantine Federated Learning

Buffered Asynchronous SGD for Byzantine Learning