Refined Analysis of Federated Averaging's Bias and Federated Richardson-Romberg Extrapolation

Paul Mangold,Alain Durmus,Aymeric Dieuleveut,Sergey Samsonov,Eric Moulines
2024-12-02
Abstract:In this paper, we present a novel analysis of FedAvg with constant step size, relying on the Markov property of the underlying process. We demonstrate that the global iterates of the algorithm converge to a stationary distribution and analyze its resulting bias and variance relative to the problem's solution. We provide a first-order expansion of the bias in both homogeneous and heterogeneous settings. Interestingly, this bias decomposes into two distinct components: one that depends solely on stochastic gradient noise and another on client heterogeneity. Finally, we introduce a new algorithm based on the Richardson-Romberg extrapolation technique to mitigate this bias.
Machine Learning,Optimization and Control
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the bias problem of the Federated Averaging (FedAvg) algorithm in federated learning. Specifically, the paper mainly focuses on the following two aspects of problems: 1. **Bias Caused by Heterogeneity and Randomness**: - **Heterogeneity Bias**: Due to the heterogeneity of client - data distributions (i.e., each client has a different data distribution), the FedAvg algorithm may converge to a point that deviates from the global optimal solution. This phenomenon is called "local drift". As the number of local update steps increases, each client tends to converge to the optimal solution that matches its local data, rather than the global optimal solution of the entire federation. - **Randomness Bias**: When using local stochastic gradients for updates, bias is also introduced. This bias is related to the variance of the gradients and the number of local update steps, similar to the phenomenon observed in traditional stochastic gradient descent (SGD). 2. **How to Mitigate These Biases**: - The paper proposes a new algorithm based on Richardson - Romberg extrapolation to mitigate the biases caused by heterogeneity and randomness. This method reduces bias by combining the results of different step sizes or different numbers of local updates without requiring additional memory cost. ### Main Contributions The main contributions of the paper include: - **Accurately Analyzing the Bias of FedAvg**: Through a detailed analysis of the FedAvg algorithm, the authors show that the algorithm converges to a stationary distribution in the presence of heterogeneity and randomness and provide a first - order expansion of the bias. This bias can be decomposed into two parts: one part depends only on the covariance of the stochastic gradients, and the other part depends only on the heterogeneity of the clients. - **Proposing a New Bias - Correction Method**: A new algorithm based on Richardson - Romberg extrapolation is introduced, which can effectively reduce bias without increasing the client's memory overhead. Experimental results show that this method outperforms existing bias - correction techniques, such as Scaffold, when the gradient variance is large. ### Formula Representation To ensure the correctness and readability of the formulas, the following are some of the key formulas involved in the paper: - **First - Order Bias Expansion**: \[ \bar{\theta}(\gamma, H)_{\text{det}} - \theta^\star = \gamma \left( \frac{H - 1}{2} b_h+O(\gamma H^2) \right) \] where \( b_h=\frac{1}{N} \sum_{c = 1}^N \nabla^2 f(\theta^\star)^{-1}(\nabla^2 f_c(\theta^\star)-\nabla^2 f(\theta^\star))\nabla f_c(\theta^\star) \). - **Richardson - Romberg Extrapolation**: \[ \vartheta_t^{(\gamma, H)} = 2\theta_t^{(\gamma, H)}-\theta_t^{(2\gamma, H)} \] \[ \bar{\vartheta}_T^{(\gamma, H)}=\frac{1}{T} \sum_{t = 0}^{T - 1} \vartheta_t^{(\gamma, H)} \] Through these contributions, the paper not only gains an in - depth understanding of the bias problem of the FedAvg algorithm in federated learning but also provides an effective solution to mitigate these biases, thereby improving the performance of federated learning.