Abstract:In the paradigm of decentralized learning, a group of agents collaborate to learn a global model using a distributed dataset without a central server; nevertheless, it is severely challenged by the heterogeneity of the data distribution across the agents. For example, the data may be distributed non-independently and identically, and even be noised or poisoned. To address these data challenges, we propose ROSS, a novel robust decentralized stochastic learning algorithm based on Shapley values, in this paper. Specifically, in each round, each agent aggregates the cross-gradient information from its neighbors, i.e., the derivatives of its local model with respect to the datasets of its neighbors, to update its local model in a momentum like manner, while we innovate in weighting the derivatives according to their contributions measured by Shapley values. We perform solid theoretical analysis to reveal the linear convergence speedup of our ROSS algorithm. We also verify the efficacy of our algorithm through extensive experiments on public datasets. Our results demonstrate that, in face of the above variety of data challenges, our ROSS algorithm have oblivious advantages over existing state-of-the-art proposals in terms of both convergence and prediction accuracy.

What problem does this paper attempt to address?

This paper attempts to address the challenges brought by data heterogeneity in decentralized learning. Specifically, the paper focuses on how to improve the robustness of decentralized learning algorithms against various data challenges, such as unbalanced data distribution, noise attacks, and poisoning attacks. ### Problem Background In the decentralized learning framework, multiple agents collaborate to learn a global model without relying on a central server. However, since the data distribution among different agents may be non - IID (non - independent and identically distributed), noisy, or contaminated, this poses a serious challenge to decentralized learning. Existing research mainly focuses on centralized distributed learning (such as federated learning), and relatively little research has been done on these problems in the context of decentralized learning. ### Solutions Proposed in the Paper To solve the above - mentioned problems, the paper proposes a robust decentralized stochastic learning algorithm based on Shapley values - ROSS (RObust decentralized Stochastic learning based on Shapley values). The main innovations of this algorithm are as follows: 1. **Using Shapley - value - weighted gradients**: Each agent weights and aggregates local gradients and cross - gradients according to the contributions of its neighbors (measured by Shapley values), thereby updating the local model more accurately. 2. **Momentum update mechanism**: Agents update the local model in a momentum - like manner to accelerate convergence. 3. **Theoretical analysis and experimental verification**: Through strict theoretical analysis, the linear convergence rate of the ROSS algorithm is proven, and through a large number of experiments, its effectiveness and superiority under various data challenges are demonstrated. ### Specific Implementation Steps - **Initialization**: Each agent initializes local model parameters \(x_i^{[0]}\) and momentum buffer \(u_i^{[0]} = 0\). - **Each iteration**: 1. Each agent calculates the local stochastic gradient \(g_i^{[t],i}\). 2. Each agent sends its local model \(x_i^{[t - 1]}\) to its neighbors. 3. After receiving the local models of neighbors, calculate the cross - gradient \(g_i^{[t],j}\) and send it back to the neighbors. 4. Update the local model \(x_i^{[t],j}\) according to the received gradients. 5. Calculate the Shapley value \(\phi_i^{[t],j}\) and perform normalization processing. 6. Calculate the weight \(\pi_i^{[t],j}\) according to the normalized Shapley value, and perform weighted averaging on the received stochastic gradients. 7. Update the momentum \(\hat{u}_i^{[t]}\) and the local model \(\hat{x}_i^{[t]}\). 8. Send the updated momentum and model to neighbors and receive the update results from neighbors. 9. Finally, update the local model \(x_i^{[t]}\) and momentum \(u_i^{[t]}\). ### Theoretical Analysis By introducing some assumptions (such as function smoothness, bounded variance of data heterogeneity, and the adjacency matrix of the communication graph being a doubly stochastic matrix, etc.), the paper proves the convergence of the ROSS algorithm and gives detailed mathematical derivations. Specifically, Theorem 1 reveals that the average gradient norm of the ROSS algorithm is mainly determined by the upper bound of the difference between the initial value and the optimal value of the objective function. ### Experimental Verification The paper conducts extensive experiments on multiple public datasets to verify the effectiveness and superiority of the ROSS algorithm in the face of various data challenges, especially in terms of convergence speed and prediction accuracy, which are superior to the existing state - of - the - art methods. Through these innovations and verifications, the paper demonstrates the potential and advantages of the ROSS algorithm in decentralized learning.

ROSS:RObust decentralized Stochastic learning based on Shapley values

RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets

Asynchronous Byzantine-Robust Stochastic Aggregation with Variance Reduction for Distributed Learning

ShapleyFL: Robust Federated Learning Based on Shapley Value

Optimizing Federated Learning on Non-IID Data Using Local Shapley Value.

A Graph Neural Network Based Decentralized Learning Scheme

Byzantine-robust decentralized stochastic optimization with stochastic gradient noise-independent learning error

Robust softmax aggregation on blockchain based federated learning with convergence guarantee

Shuffle Private Decentralized Convex Optimization

Learning-Augmented Decentralized Online Convex Optimization in Networks

Decentralized Federated Learning with Unreliable Communications

Decentralized Online Learning: Take Benefits from Others’ Data without Sharing Your Own to Track Global Trend

Efficient Byzantine-Resilient Stochastic Gradient Desce

Communication-Efficient and Byzantine-Robust Distributed Stochastic Learning with Arbitrary Number of Corrupted Workers

Byzantine-resilient Decentralized Stochastic Gradient Descent

Robust Distributed Learning Against Both Distributional Shifts and Byzantine Attacks

Integrating Staleness and Shapley Value Consistency for Efficient K-Asynchronous Federated Learning

Prox-DBRO-VR: A Unified Analysis on Decentralized Byzantine-Resilient Composite Stochastic Optimization with Variance Reduction and Non-Asymptotic Convergence Rates

On the Tradeoff between Privacy Preservation and Byzantine-Robustness in Decentralized Learning

Faster Convergence with Less Communication: Broadcast-Based Subgraph Sampling for Decentralized Learning over Wireless Networks

Scalable Data Point Valuation in Decentralized Learning