Abstract:In this paper, we present a novel analysis of FedAvg with constant step size, relying on the Markov property of the underlying process. We demonstrate that the global iterates of the algorithm converge to a stationary distribution and analyze its resulting bias and variance relative to the problem's solution. We provide a first-order expansion of the bias in both homogeneous and heterogeneous settings. Interestingly, this bias decomposes into two distinct components: one that depends solely on stochastic gradient noise and another on client heterogeneity. Finally, we introduce a new algorithm based on the Richardson-Romberg extrapolation technique to mitigate this bias.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the bias problem of the Federated Averaging (FedAvg) algorithm in federated learning. Specifically, the paper mainly focuses on the following two aspects of problems: 1. **Bias Caused by Heterogeneity and Randomness**: - **Heterogeneity Bias**: Due to the heterogeneity of client - data distributions (i.e., each client has a different data distribution), the FedAvg algorithm may converge to a point that deviates from the global optimal solution. This phenomenon is called "local drift". As the number of local update steps increases, each client tends to converge to the optimal solution that matches its local data, rather than the global optimal solution of the entire federation. - **Randomness Bias**: When using local stochastic gradients for updates, bias is also introduced. This bias is related to the variance of the gradients and the number of local update steps, similar to the phenomenon observed in traditional stochastic gradient descent (SGD). 2. **How to Mitigate These Biases**: - The paper proposes a new algorithm based on Richardson - Romberg extrapolation to mitigate the biases caused by heterogeneity and randomness. This method reduces bias by combining the results of different step sizes or different numbers of local updates without requiring additional memory cost. ### Main Contributions The main contributions of the paper include: - **Accurately Analyzing the Bias of FedAvg**: Through a detailed analysis of the FedAvg algorithm, the authors show that the algorithm converges to a stationary distribution in the presence of heterogeneity and randomness and provide a first - order expansion of the bias. This bias can be decomposed into two parts: one part depends only on the covariance of the stochastic gradients, and the other part depends only on the heterogeneity of the clients. - **Proposing a New Bias - Correction Method**: A new algorithm based on Richardson - Romberg extrapolation is introduced, which can effectively reduce bias without increasing the client's memory overhead. Experimental results show that this method outperforms existing bias - correction techniques, such as Scaffold, when the gradient variance is large. ### Formula Representation To ensure the correctness and readability of the formulas, the following are some of the key formulas involved in the paper: - **First - Order Bias Expansion**: \[ \bar{\theta}(\gamma, H)_{\text{det}} - \theta^\star = \gamma \left( \frac{H - 1}{2} b_h+O(\gamma H^2) \right) \] where \( b_h=\frac{1}{N} \sum_{c = 1}^N \nabla^2 f(\theta^\star)^{-1}(\nabla^2 f_c(\theta^\star)-\nabla^2 f(\theta^\star))\nabla f_c(\theta^\star) \). - **Richardson - Romberg Extrapolation**: \[ \vartheta_t^{(\gamma, H)} = 2\theta_t^{(\gamma, H)}-\theta_t^{(2\gamma, H)} \] \[ \bar{\vartheta}_T^{(\gamma, H)}=\frac{1}{T} \sum_{t = 0}^{T - 1} \vartheta_t^{(\gamma, H)} \] Through these contributions, the paper not only gains an in - depth understanding of the bias problem of the FedAvg algorithm in federated learning but also provides an effective solution to mitigate these biases, thereby improving the performance of federated learning.

Refined Analysis of Federated Averaging's Bias and Federated Richardson-Romberg Extrapolation

Decentralized Federated Averaging

Riemannian Federated Learning via Averaging Gradient Stream

FedExP: Speeding Up Federated Averaging via Extrapolation

A Non-parametric View of FedAvg and FedProx: Beyond Stationary Points

Robust Federated Averaging via Outlier Pruning

A Lightweight Method for Tackling Unknown Participation Statistics in Federated Averaging

On Convergence of Federated Averaging Langevin Dynamics

On the Convergence of FedAvg on Non-IID Data

A New Theoretical Perspective on Data Heterogeneity in Federated Optimization

FedStale: leveraging stale client updates in federated learning

FedDA: Faster Framework of Local Adaptive Gradient Methods via Restarted Dual Averaging

Partial model averaging in Federated Learning: Performance guarantees and benefits

SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

The Aggregation-Heterogeneity Trade-off in Federated Learning.

Federated Learning with Unbiased Gradient Aggregation and Controllable Meta Updating

Federated Learning with Unbiased Gradient Aggregation and Controllable Meta Updating

Understanding and Improving Model Averaging in Federated Learning on Heterogeneous Data

Gradient Masked Averaging for Federated Learning

Federated Optimization of Smooth Loss Functions

On the effectiveness of partial variance reduction in federated learning with heterogeneous data