Abstract:As deep learning models are usually massive and complex, distributed learning is essential for increasing training efficiency. Moreover, in many real-world application scenarios like healthcare, distributed learning can also keep the data local and protect privacy. Recently, the asynchronous decentralized parallel stochastic gradient descent (ADPSGD) algorithm has been proposed and demonstrated to be an efficient and practical strategy where there is no central server, so that each computing node only communicates with its neighbors. Although no raw data will be transmitted across different local nodes, there is still a risk of information leak during the communication process for malicious participants to make attacks. In this paper, we present a differentially private version of asynchronous decentralized parallel SGD framework, or A(DP) $^2$ SGD for short, which maintains communication efficiency of ADPSGD and prevents the inference from malicious participants. Specifically, Rényi differential privacy is used to provide tighter privacy analysis for our composite Gaussian mechanisms while the convergence rate is consistent with the non-private version. Theoretical analysis shows A(DP) $^2$ SGD also converges at the optimal $\mathcal {O}(1/\sqrt{T})$ rate as SGD. Empirically, A(DP) $^2$ SGD achieves comparable model accuracy as the differentially private version of Synchronous SGD (SSGD) but runs much faster than SSGD in heterogeneous computing environments.

Byzantine-Resilient Non-Convex Stochastic Gradient Descent

Efficient Byzantine-Resilient Stochastic Gradient Desce

Resilient to byzantine attacks finite-sum optimization over networks

Federated Variance-Reduced Stochastic Gradient Descent With Robustness to Byzantine Attacks

Byzantine-Resilient Stochastic Gradient Descent for Distributed Learning: A Lipschitz-Inspired Coordinate-wise Median Approach

Byzantine-resilient decentralized stochastic gradient descent

Byzantine-robust decentralized stochastic optimization with stochastic gradient noise-independent learning error

Fall of Empires: Breaking Byzantine-tolerant SGD by Inner Product Manipulation

A(DP)$^2$SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent with Differential Privacy

Communication-Efficient and Byzantine-Robust Distributed Stochastic Learning with Arbitrary Number of Corrupted Workers

Byzantine-Robust Loopless Stochastic Variance-Reduced Gradient

Asynchronous Byzantine-Robust Stochastic Aggregation with Variance Reduction for Distributed Learning

A(DP)$^2$2SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent with Differential Privacy

Byzantine-Robust Distributed Learning with Compression.

Robust Decentralized Stochastic Gradient Descent over Unstable Networks.

High Dimensional Distributed Gradient Descent with Arbitrary Number of Byzantine Attackers

Asynchronous Decentralized Accelerated Stochastic Gradient Descent

Resilient Two-Time-Scale Local Stochastic Gradient Descent for Byzantine Federated Learning

Adaptive Distributed Learning with Byzantine Robustness: A Gradient-Projection-Based Method

Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning.