Abstract:Federated averaging (FedAvg) is a communication-efficient algorithm for distributed training with an enormous number of clients. In FedAvg, clients keep their data locally for privacy protection; a central parameter server is used to communicate between clients. This central server distributes the parameters to each client and collects the updated parameters from clients. FedAvg is mostly studied in centralized fashions, requiring massive communications between the central server and clients, which leads to possible channel blocking. Moreover, attacking the central server can break the whole system's privacy. Indeed, decentralization can significantly reduce the communication of the busiest node (the central one) because all nodes only communicate with their neighbors. To this end, in this paper, we study the decentralized FedAvg with momentum (DFedAvgM), implemented on clients that are connected by an undirected graph. In DFedAvgM, all clients perform stochastic gradient descent with momentum and communicate with their neighbors only. To further reduce the communication cost, we also consider the quantized DFedAvgM. The proposed algorithm involves the mixing matrix, momentum, client training with multiple local iterations, and quantization, introducing extra items in the Lyapunov analysis. Thus, the analysis of this paper is much more challenging than previous decentralized (momentum) SGD or FedAvg. We prove convergence of the (quantized) DFedAvgM under trivial assumptions; the convergence rate can be improved to sublinear when the loss function satisfies the PŁ property. Numerically, we find that the proposed algorithm outperforms FedAvg in both convergence speed and communication cost.

FEDERATED STOCHASTIC GRADIENT DESCENT BEGETS SELF-INDUCED MOMENTUM

Hierarchical Federated Learning: the Interplay of User Mobility and Data Heterogeneity

Understanding the Training Dynamics in Federated Deep Learning via Aggregation Weight Optimization

FedAgg: Adaptive Federated Learning with Aggregated Gradients

Accelerating Federated Learning via Momentum Gradient Descent

Momentum Benefits Non-iid Federated Learning Simply and Provably

Decentralized Federated Learning under Communication Delays.

FedCM: Federated Learning with Client-level Momentum

FedEmb: A Vertical and Hybrid Federated Learning Algorithm using Network And Feature Embedding Aggregation

Decentralized Federated Learning: Balancing Communication and Computing Costs

Federated Stochastic Gradient Langevin Dynamics

Statistical Estimation and Inference via Local SGD in Federated Learning

Enhance Local Consistency for Free: A Multi-Step Inertial Momentum Approach

Gradient-Congruity Guided Federated Sparse Training

Stochastic Approximation Approach to Federated Machine Learning

Enhancing Federated Learning Convergence with Dynamic Data Queue and Data Entropy-driven Participant Selection

Decentralized Federated Averaging

DRAG: Divergence-based Adaptive Aggregation in Federated Learning on Non-IID Data

On the Convergence of Communication-Efficient Local SGD for Federated Learning

Depersonalized Federated Learning: Tackling Statistical Heterogeneity by Alternating Stochastic Gradient Descent

Federated Learning with Unbiased Gradient Aggregation and Controllable Meta Updating