Abstract:Federated Learning (FL) is popular for communication-efficient learning from distributed data. To utilize data at different clients without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a computation then aggregation model, in which multiple local updates are performed using local data before aggregation. These algorithms fail to work when faced with practical challenges, e.g., the local data being non-identically independently distributed. In this paper, we first characterize the behavior of the FedAvg algorithm, and show that without strong and unrealistic assumptions on the problem structure, it can behave erratically. Aiming at designing FL algorithms that are provably fast and require as few assumptions as possible, we propose a new algorithm design strategy from the primal-dual optimization perspective. Our strategy yields algorithms that can deal with non-convex objective functions, achieves the best possible optimization and communication complexity (in a well-defined sense), and accommodates full-batch and mini-batch local computation models. Importantly, the proposed algorithms are <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">communication efficient</i> , in that the communication effort can be reduced when the level of heterogeneity among the local data also reduces. In the extreme case where the local data becomes homogeneous, only <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\mathcal {O}(1)$</tex-math></inline-formula> communication is required among the agents. To the best of our knowledge, this is the first algorithmic framework for FL that achieves all the above properties.

Layer-wise and Dimension-wise Locally Adaptive Federated Learning

FedDGP: Disentangling Global and Personal Models for Federated Learning

Locally Adaptive Federated Learning

Efficient Federated Learning Using Layer-Wise Regulation and Momentum Aggregation*

Federated Learning with Flexible Architectures

Layer-wise Adaptive Model Aggregation for Scalable Federated Learning

Towards Layer-Wise Personalized Federated Learning: Adaptive Layer Disentanglement via Conflicting Gradients

Adaptive Federated Learning on Non-IID Data with Resource Constraint

FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable Federated Learning

FedAgg: Adaptive Federated Learning with Aggregated Gradients

Federated Adversarial Learning: A Framework with Convergence Analysis

Federated mutual learning

Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear Speedup

FedSAE: A Novel Self-Adaptive Federated Learning Framework in Heterogeneous Systems

FedPD: A Federated Learning Framework With Adaptivity to Non-IID Data

Lazy Aggregation for Heterogeneous Federated Learning

FedLion: Faster Adaptive Federated Optimization with Fewer Communication

AdaFed: Fair Federated Learning via Adaptive Common Descent Direction

Fisher Information-based Efficient Curriculum Federated Learning with Large Language Models

Communication-Efficient Model Aggregation with Layer Divergence Feedback in Federated Learning

FedDA: Faster Framework of Local Adaptive Gradient Methods via Restarted Dual Averaging