Abstract:In the paradigm of decentralized learning, a group of agents collaborate to learn a global model using a distributed dataset without a central server; nevertheless, it is severely challenged by the heterogeneity of the data distribution across the agents. For example, the data may be distributed non-independently and identically, and even be noised or poisoned. To address these data challenges, we propose ROSS, a novel robust decentralized stochastic learning algorithm based on Shapley values, in this paper. Specifically, in each round, each agent aggregates the cross-gradient information from its neighbors, i.e., the derivatives of its local model with respect to the datasets of its neighbors, to update its local model in a momentum like manner, while we innovate in weighting the derivatives according to their contributions measured by Shapley values. We perform solid theoretical analysis to reveal the linear convergence speedup of our ROSS algorithm. We also verify the efficacy of our algorithm through extensive experiments on public datasets. Our results demonstrate that, in face of the above variety of data challenges, our ROSS algorithm have oblivious advantages over existing state-of-the-art proposals in terms of both convergence and prediction accuracy.
What problem does this paper attempt to address?
This paper attempts to address the challenges brought by data heterogeneity in decentralized learning. Specifically, the paper focuses on how to improve the robustness of decentralized learning algorithms against various data challenges, such as unbalanced data distribution, noise attacks, and poisoning attacks.
### Problem Background
In the decentralized learning framework, multiple agents collaborate to learn a global model without relying on a central server. However, since the data distribution among different agents may be non - IID (non - independent and identically distributed), noisy, or contaminated, this poses a serious challenge to decentralized learning. Existing research mainly focuses on centralized distributed learning (such as federated learning), and relatively little research has been done on these problems in the context of decentralized learning.
### Solutions Proposed in the Paper
To solve the above - mentioned problems, the paper proposes a robust decentralized stochastic learning algorithm based on Shapley values - ROSS (RObust decentralized Stochastic learning based on Shapley values). The main innovations of this algorithm are as follows:
1. **Using Shapley - value - weighted gradients**: Each agent weights and aggregates local gradients and cross - gradients according to the contributions of its neighbors (measured by Shapley values), thereby updating the local model more accurately.
2. **Momentum update mechanism**: Agents update the local model in a momentum - like manner to accelerate convergence.
3. **Theoretical analysis and experimental verification**: Through strict theoretical analysis, the linear convergence rate of the ROSS algorithm is proven, and through a large number of experiments, its effectiveness and superiority under various data challenges are demonstrated.
### Specific Implementation Steps
- **Initialization**: Each agent initializes local model parameters \(x_i^{[0]}\) and momentum buffer \(u_i^{[0]} = 0\).
- **Each iteration**:
1. Each agent calculates the local stochastic gradient \(g_i^{[t],i}\).
2. Each agent sends its local model \(x_i^{[t - 1]}\) to its neighbors.
3. After receiving the local models of neighbors, calculate the cross - gradient \(g_i^{[t],j}\) and send it back to the neighbors.
4. Update the local model \(x_i^{[t],j}\) according to the received gradients.
5. Calculate the Shapley value \(\phi_i^{[t],j}\) and perform normalization processing.
6. Calculate the weight \(\pi_i^{[t],j}\) according to the normalized Shapley value, and perform weighted averaging on the received stochastic gradients.
7. Update the momentum \(\hat{u}_i^{[t]}\) and the local model \(\hat{x}_i^{[t]}\).
8. Send the updated momentum and model to neighbors and receive the update results from neighbors.
9. Finally, update the local model \(x_i^{[t]}\) and momentum \(u_i^{[t]}\).
### Theoretical Analysis
By introducing some assumptions (such as function smoothness, bounded variance of data heterogeneity, and the adjacency matrix of the communication graph being a doubly stochastic matrix, etc.), the paper proves the convergence of the ROSS algorithm and gives detailed mathematical derivations. Specifically, Theorem 1 reveals that the average gradient norm of the ROSS algorithm is mainly determined by the upper bound of the difference between the initial value and the optimal value of the objective function.
### Experimental Verification
The paper conducts extensive experiments on multiple public datasets to verify the effectiveness and superiority of the ROSS algorithm in the face of various data challenges, especially in terms of convergence speed and prediction accuracy, which are superior to the existing state - of - the - art methods.
Through these innovations and verifications, the paper demonstrates the potential and advantages of the ROSS algorithm in decentralized learning.