Abstract:Federated Learning (FL) has emerged as a crucial distributed training paradigm, enabling discrete devices to collaboratively train a shared model under the coordination of a central server, while leveraging their locally stored private data. Nonetheless, the non-independent-and-identically-distributed (Non-IID) data generated on heterogeneous clients and the incessant information exchange among participants may significantly impede training efficacy, retard the model convergence rate and increase the risk of privacy leakage. To alleviate the divergence between the local and average model parameters and obtain a fast model convergence rate, we propose an adaptive FEDerated learning algorithm called FedAgg by refining the conventional stochastic gradient descent (SGD) methodology with an AGgregated Gradient term at each local training epoch and adaptively adjusting the learning rate based on a penalty term that quantifies the local model deviation. To tackle the challenge of information exchange among clients during local training and design a decentralized adaptive learning rate for each client, we introduce two mean-field terms to approximate the average local parameters and gradients over time. Through rigorous theoretical analysis, we demonstrate the existence and convergence of the mean-field terms and provide a robust upper bound on the convergence of our proposed algorithm. The extensive experimental results on real-world datasets substantiate the superiority of our framework in comparison with existing state-of-the-art FL strategies for enhancing model performance and accelerating convergence rate under IID and Non-IID datasets.

Toward Communication Efficient Adaptive Gradient Method

On the Convergence of Decentralized Adaptive Gradient Methods

Adaptive Batchsize Selection and Gradient Compression for Wireless Federated Learning

Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient

Lazily Aggregated Quantized Gradient Innovation for Communication-Efficient Federated Learning.

A Stochastic Asynchronous Gradient Descent Algorithm with Delay Compensation Mechanism

A Novel Adaptive Gradient Compression Approach for Communication-Efficient Federated Learning

Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods

Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization

Communication-Adaptive Stochastic Gradient Methods for Distributed Learning

Peering Beyond the Gradient Veil with Distributed Auto Differentiation

On the Convergence of Communication-Efficient Local SGD for Federated Learning

CADA: Communication-Adaptive Distributed Adam

A Derivative-Incorporated Adaptive Gradient Method for Federated Learning

Decentralised Federated Learning with Adaptive Partial Gradient Aggregation

AFedAvg: communication-efficient federated learning aggregation with adaptive communication frequency and gradient sparse

Accelerated Stochastic ExtraGradient: Mixing Hessian and Gradient Similarity to Reduce Communication in Distributed and Federated Learning

Communication-Efficient Federated Hypergradient Computation via Aggregated Iterative Differentiation

Communication-Efficient Nonconvex Federated Learning with Error Feedback for Uplink and Downlink.

Adaptive Gradient Sparsification for Efficient Federated Learning: An Online Learning Approach

FedAgg: Adaptive Federated Learning with Aggregated Gradients