Abstract:Federated learning (FL) is a general framework for learning across an axis of group partitioned data (heterogeneous clients) while preserving data privacy, under the orchestration of a central server. FL methods often compute gradients of loss functions purely locally (ie. entirely at each client, or entirely at the server), typically using automatic differentiation (AD) techniques. We propose a federated automatic differentiation (FAD) framework that 1) enables computing derivatives of functions involving client and server computation as well as communication between them and 2) operates in a manner compatible with existing federated technology. In other words, FAD computes derivatives across communication boundaries. We show, in analogy with traditional AD, that FAD may be implemented using various accumulation modes, which introduce distinct computation-communication trade-offs and systems requirements. Further, we show that a broad class of federated computations is closed under these various modes of FAD, implying in particular that if the original computation can be implemented using privacy-preserving primitives, its derivative may be computed using only these same primitives. We then show how FAD can be used to create algorithms that dynamically learn components of the algorithm itself. In particular, we show that FedAvg-style algorithms can exhibit significantly improved performance by using FAD to adjust the server optimization step automatically, or by using FAD to learn weighting schemes for computing weighted averages across clients.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to apply automatic differentiation (AD) in the federated learning (FL) framework while maintaining compatibility with privacy - protection techniques. Specifically, the authors propose a new framework named "Federated Automatic Differentiation (Federated AD)" aiming to solve the following problems: 1. **Compute the derivatives of functions involving communication between clients and servers**: Traditional AD methods are usually only applicable to function differentiation in a single - machine environment. However, in the FL environment, computations are distributed among multiple clients and servers, so a new method is required to handle the derivatives of these distributed computations. 2. **Ensure compatibility with existing federated techniques and privacy - protection mechanisms**: To ensure data privacy, FL usually uses techniques such as differential privacy and secure aggregation. The new differentiation method must be able to be compatible with these techniques to ensure that client data is not leaked when computing derivatives. 3. **Dynamically adjust FL algorithms and their hyper - parameters**: By introducing Federated AD, the hyper - parameters in FL algorithms can be dynamically adjusted, thereby improving the performance and convergence speed of the algorithms. For example, hyper - parameters such as the learning rate can be optimized by the gradient - descent method without the need for manual parameter tuning. ### Specific Problem Description The paper mainly focuses on the following aspects: - **How to perform differentiation through federated computing**: The authors propose a framework that enables the computation of derivatives of complex federated computations involving clients and servers without violating the data minimization principle of FL. - **How to ensure the scalability of computations and privacy protection**: The proposed Federated AD framework can not only handle large - scale federated computations but also perform derivative computations while maintaining privacy. - **How to design more effective FL algorithms using Federated AD**: Through Federated AD, adaptive federated optimization algorithms can be developed, which can dynamically adjust their own behaviors according to the feedback during the training process, thereby improving performance. ### Method Overview The authors propose three modes of Federated AD: 1. **Forward - Mode**: Similar to the forward - mode in traditional AD, but optimized for federated computing. It computes derivatives through the same communication pattern, avoiding direct sharing of client data. 2. **Reverse - Mode**: Similar to the reverse - mode in traditional AD, suitable for scenarios with high - dimensional inputs. It computes derivatives through back - propagation and can reduce the amount of communication in some cases. 3. **Mixed - Mode**: Combines the advantages of the forward - mode and the reverse - mode, and selects an appropriate computation method according to specific situations. ### Application Examples Through Federated AD, the following improvements can be achieved: - **Adaptive hyper - parameter adjustment**: For example, hyper - parameters such as the learning rate can be dynamically adjusted, thereby improving the performance of algorithms such as FedAvg. - **Adaptive client - weight allocation**: Dynamically adjust the weights of clients according to their contributions, thereby improving the performance of the overall model. In summary, the main objective of this paper is to simplify the design and optimization process of FL algorithms by introducing Federated AD while ensuring data privacy and computational efficiency.

Federated Automatic Differentiation

FedDGP: Disentangling Global and Personal Models for Federated Learning

FedADMM: A Robust Federated Deep Learning Framework with Adaptivity to System Heterogeneity

FADAS: Towards Federated Adaptive Asynchronous Optimization

FedPD: A Federated Learning Framework With Adaptivity to Non-IID Data

Federated Learning Using Three-Operator ADMM

FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data.

Beyond ADMM: A Unified Client-variance-reduced Adaptive Federated Learning Framework

Federated Adversarial Learning: A Framework with Convergence Analysis

AdaFed: Fair Federated Learning via Adaptive Common Descent Direction

Accelerated Federated Learning with Decoupled Adaptive Optimization

FedDAA: a robust federated learning framework to protect privacy and defend against adversarial attack

FedDA: Faster Framework of Local Adaptive Gradient Methods via Restarted Dual Averaging

A differential privacy federated learning framework for accelerating convergence

Federated Learning Optimization Algorithm for Automatic Weight Optimal

Hierarchical Federated ADMM

Gradient Masked Averaging for Federated Learning

Federated Learning on Non-Independent and Identically Distributed Data

Federated Learning Algorithm Based on Adaptive Gradient Fusion

Preconditioned Federated Learning

FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable Federated Learning