On Principled Local Optimization Methods for Federated Learning

Honglin Yuan
2024-01-24
Abstract:Federated Learning (FL), a distributed learning paradigm that scales on-device learning collaboratively, has emerged as a promising approach for decentralized AI applications. Local optimization methods such as Federated Averaging (FedAvg) are the most prominent methods for FL applications. Despite their simplicity and popularity, the theoretical understanding of local optimization methods is far from clear. This dissertation aims to advance the theoretical foundation of local methods in the following three directions.
Machine Learning,Distributed, Parallel, and Cluster Computing,Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that the theoretical basis of local optimization methods in Federated Learning (FL) has not been fully clarified. Specifically, the paper focuses on the following three main directions: 1. **Establishing Sharp Theoretical Bounds**: - The paper first establishes sharp lower bounds for Federated Averaging (FedAvg), the most popular federated learning algorithm. By introducing the concept of "iterate bias", the paper shows that FedAvg may deviate from the optimal solution in some cases, and proposes a method that can alleviate this problem under the third - order smoothness assumption. - This phenomenon is explained from the perspective of Stochastic Differential Equation (SDE). 2. **Improving Acceleration Methods**: - The paper proposes Federated Accelerated Stochastic Gradient Descent (FedAc), which is the first principled acceleration method for FedAvg. FedAc not only improves the convergence speed but also the communication efficiency. By introducing potential - based perturbed iterative analysis and stability analysis of generalized accelerated SGD, the paper solves the trade - off problem between acceleration and stability. 3. **Expanding to More Complex Optimization Problems**: - The paper studies the Federated Composite Optimization (FCO) problem that includes non - smooth regularization terms. Directly extending FedAvg to FCO may lead to the "curse of primal - space averaging", that is, simply averaging client models will cause the regularization effect to be invalid. To this end, the paper proposes a new primal - dual algorithm - Federated Dual Averaging (FedDualAvg), which overcomes this problem by averaging in the dual space. ### Formula Summary - **Iterate Bias**: Defined as the deviation between the expectation of the SGD trajectory and the noise - free gradient descent trajectory with the same initialization. For convex and smooth objective functions, even starting from the optimal solution, after \(k\) steps, the mean of SGD may deviate from the optimal solution at a rate of \(\Theta(\eta^2 k^{3/2})\). \[ \mathbb{E}[x_k]-x^{\star}=\Theta(\eta^2 k^{3/2}) \] - **Improvement under the Third - Order Smoothness Assumption**: Under the third - order smoothness assumption, the iterate bias is reduced to \(\Theta(\eta^3 k^2)\). \[ \mathbb{E}[x_k]-x^{\star}=\Theta(\eta^3 k^2) \] - **Convergence Rate of FedAvg**: In the convex and smooth case, the convergence rate of FedAvg is: \[ O\left(\frac{LB^2}{KR}+\frac{\sigma B}{\sqrt{MKR}}+\frac{L^{1/3}\sigma^{2/3}B^{4/3}}{K^{1/3}R^{2/3}}+\frac{L^{1/3}\zeta^{2/3}B^{4/3}}{R^{2/3}}\right) \] Through these works, the paper aims to promote the theoretical basis of local optimization methods in federated learning, providing a deeper understanding and more effective algorithm design.