Abstract:In this paper, we question the rationale behind propagating large numbers of parameters through a distributed system during federated learning. We start by examining the rank characteristics of the subspace spanned by gradients across epochs (i.e., the gradient-space) in centralized model training, and observe that this gradient-space often consists of a few leading principal components accounting for an overwhelming majority (95-99%) of the explained variance. Motivated by this, we propose the "Look-back Gradient Multiplier" (LBGM) algorithm, which exploits this low-rank property to enable gradient recycling between model update rounds of federated learning, reducing transmissions of large parameters to single scalars for aggregation. We analytically characterize the convergence behavior of LBGM, revealing the nature of the trade-off between communication savings and model performance. Our subsequent experimental results demonstrate the improvement LBGM obtains in communication overhead compared to conventional federated learning on several datasets and deep learning models. Additionally, we show that LBGM is a general plug-and-play algorithm that can be used standalone or stacked on top of existing sparsification techniques for distributed model training.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in Federated Learning (FL), how to reduce the number of parameter transmissions during the model update process, thereby reducing communication overhead. Specifically, the paper focuses on achieving this goal by analyzing the low - rank characteristics of the gradient subspace. ### Problem Background Federated Learning is a distributed machine - learning paradigm that allows multiple devices or clients to jointly train a model without sharing the original data. However, as neural - network models become larger and larger (for example, containing millions to billions of parameters), transmitting these large numbers of parameters during the Federated Learning process will lead to significant communication overhead. This not only increases the bandwidth requirements but may also affect the speed and efficiency of model training. ### Core Assumptions of the Paper The main assumptions of the paper are: - **Low - rank characteristics of the gradient subspace**: The gradient subspace generated during the Stochastic Gradient Descent (SGD) process is usually low - rank, that is, most of the variance can be explained by a few principal components. This means that the new gradient can be approximately represented by using these principal components, thereby reducing the amount of data that needs to be transmitted. ### Proposed Method Based on the above assumptions, the paper proposes an algorithm named "Look - back Gradient Multiplier" (LBGM). The main ideas of LBGM are: 1. **Gradient reuse**: By reusing the previously transmitted gradients to represent the newly generated gradients, only a scalar (that is, the projection coefficient of the gradient) needs to be transmitted instead of the entire gradient vector. 2. **Dynamic update**: Only when the change in the gradient exceeds a certain threshold will the complete gradient vector be transmitted to update the "look - back gradients". ### Main Contributions 1. **Verification of low - rank characteristics**: Through experiments on multiple neural - network models and datasets, the low - rank characteristics of the gradient subspace are verified. 2. **Design and analysis of the LBGM algorithm**: The LBGM algorithm is proposed, and its convergence is theoretically analyzed, revealing the trade - off relationship between communication savings and model performance. 3. **Experimental results**: The communication - overhead - reduction effect of LBGM on different datasets and deep - learning models is demonstrated, proving its effectiveness as an independent solution or when combined with other compression techniques. ### Summary This paper aims to propose an effective method to reduce communication overhead in Federated Learning by exploring the low - rank characteristics of the gradient subspace. The LBGM algorithm significantly reduces communication costs while maintaining model performance through the gradient - reuse and dynamic - update mechanisms.

Recycling Model Updates in Federated Learning: Are Gradient Subspaces Low-Rank?

FedDGP: Disentangling Global and Personal Models for Federated Learning

Understanding the Training Dynamics in Federated Deep Learning via Aggregation Weight Optimization

Lazily Aggregated Quantized Gradient Innovation for Communication-Efficient Federated Learning.

Adaptive Gradient Sparsification for Efficient Federated Learning: An Online Learning Approach

AsyncFedED: Asynchronous Federated Learning with Euclidean Distance Based Adaptive Weight Aggregation

Federated Learning with Unbiased Gradient Aggregation and Controllable Meta Updating

Efficient Asynchronous Vertical Federated Learning via Gradient Prediction and Double-End Sparse Compression

Federated Dynamical Low-Rank Training with Global Loss Convergence Guarantees

Distributed Gradient Descent with Many Local Steps in Overparameterized Models

Revisiting Communication-Efficient Federated Learning with Balanced Global and Local Updates

FedRepOpt: Gradient Re-parametrized Optimizers in Federated Learning

Federated Learning with Unbiased Gradient Aggregation and Controllable Meta Updating

AFedAvg: communication-efficient federated learning aggregation with adaptive communication frequency and gradient sparse

Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Block Gradient Descent

Reducing Impacts of System Heterogeneity in Federated Learning using Weight Update Magnitudes

Gradient Masked Averaging for Federated Learning

Accelerating Federated Learning by Selecting Beneficial Herd of Local Gradients

Communication-Efficient Model Aggregation with Layer Divergence Feedback in Federated Learning

FedAgg: Adaptive Federated Learning with Aggregated Gradients

Fusion of Global and Local Knowledge for Personalized Federated Learning