What problem does this paper attempt to address?

The problem that this paper attempts to solve is that the theoretical basis of local optimization methods in Federated Learning (FL) has not been fully clarified. Specifically, the paper focuses on the following three main directions: 1. **Establishing Sharp Theoretical Bounds**: - The paper first establishes sharp lower bounds for Federated Averaging (FedAvg), the most popular federated learning algorithm. By introducing the concept of "iterate bias", the paper shows that FedAvg may deviate from the optimal solution in some cases, and proposes a method that can alleviate this problem under the third - order smoothness assumption. - This phenomenon is explained from the perspective of Stochastic Differential Equation (SDE). 2. **Improving Acceleration Methods**: - The paper proposes Federated Accelerated Stochastic Gradient Descent (FedAc), which is the first principled acceleration method for FedAvg. FedAc not only improves the convergence speed but also the communication efficiency. By introducing potential - based perturbed iterative analysis and stability analysis of generalized accelerated SGD, the paper solves the trade - off problem between acceleration and stability. 3. **Expanding to More Complex Optimization Problems**: - The paper studies the Federated Composite Optimization (FCO) problem that includes non - smooth regularization terms. Directly extending FedAvg to FCO may lead to the "curse of primal - space averaging", that is, simply averaging client models will cause the regularization effect to be invalid. To this end, the paper proposes a new primal - dual algorithm - Federated Dual Averaging (FedDualAvg), which overcomes this problem by averaging in the dual space. ### Formula Summary - **Iterate Bias**: Defined as the deviation between the expectation of the SGD trajectory and the noise - free gradient descent trajectory with the same initialization. For convex and smooth objective functions, even starting from the optimal solution, after \(k\) steps, the mean of SGD may deviate from the optimal solution at a rate of \(\Theta(\eta^2 k^{3/2})\). \[ \mathbb{E}[x_k]-x^{\star}=\Theta(\eta^2 k^{3/2}) \] - **Improvement under the Third - Order Smoothness Assumption**: Under the third - order smoothness assumption, the iterate bias is reduced to \(\Theta(\eta^3 k^2)\). \[ \mathbb{E}[x_k]-x^{\star}=\Theta(\eta^3 k^2) \] - **Convergence Rate of FedAvg**: In the convex and smooth case, the convergence rate of FedAvg is: \[ O\left(\frac{LB^2}{KR}+\frac{\sigma B}{\sqrt{MKR}}+\frac{L^{1/3}\sigma^{2/3}B^{4/3}}{K^{1/3}R^{2/3}}+\frac{L^{1/3}\zeta^{2/3}B^{4/3}}{R^{2/3}}\right) \] Through these works, the paper aims to promote the theoretical basis of local optimization methods in federated learning, providing a deeper understanding and more effective algorithm design.

On Principled Local Optimization Methods for Federated Learning

Understanding the Training Dynamics in Federated Deep Learning via Aggregation Weight Optimization

Optimizing Federated Learning on Non-IID Data Using Local Shapley Value.

Preconditioned Federated Learning

Accelerated Federated Learning with Decoupled Adaptive Optimization

FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data.

Communication-Efficient Zeroth-Order Adaptive Optimization for Federated Learning

A Generalized Look at Federated Learning: Survey and Perspectives

Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear Speedup

Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning

Federated Learning with Unbiased Gradient Aggregation and Controllable Meta Updating

Review of Mathematical Optimization in Federated Learning

Personalized Federated Learning: A Unified Framework and Universal Optimization Techniques

CONTINUAL LOCAL TRAINING FOR BETTER INITIALIZATION OF FEDERATED MODELS

Over-the-Air Federated Learning and Optimization

On the Convergence of Communication-Efficient Local SGD for Federated Learning

FedPD: A Federated Learning Framework With Adaptivity to Non-IID Data

When Decentralized Optimization Meets Federated Learning

Addressing Algorithmic Disparity and Performance Inconsistency in Federated Learning

FedGrad: Optimisation in Decentralised Machine Learning

Federated Learning with Additional Mechanisms on Clients to Reduce Communication Costs