Federated Communication-Efficient Multi-Objective Optimization

Baris Askin,Pranay Sharma,Gauri Joshi,Carlee Joe-Wong
2024-10-22
Abstract:We study a federated version of multi-objective optimization (MOO), where a single model is trained to optimize multiple objective functions. MOO has been extensively studied in the centralized setting but is less explored in federated or distributed settings. We propose FedCMOO, a novel communication-efficient federated multi-objective optimization (FMOO) algorithm that improves the error convergence performance of the model compared to existing approaches. Unlike prior works, the communication cost of FedCMOO does not scale with the number of objectives, as each client sends a single aggregated gradient, obtained using randomized SVD (singular value decomposition), to the central server. We provide a convergence analysis of the proposed method for smooth non-convex objective functions under milder assumptions than in prior work. In addition, we introduce a variant of FedCMOO that allows users to specify a preference over the objectives in terms of a desired ratio of the final objective values. Through extensive experiments, we demonstrate the superiority of our proposed method over baseline approaches.
Machine Learning,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper mainly studies the problem of **Federated Multi - Objective Optimization (FMOO)**. Specifically, it aims to solve how to effectively train a model to optimize multiple objective functions simultaneously in a distributed or federated learning environment. #### Background and Challenges 1. **Centralized Multi - Objective Optimization (MOO)**: Although MOO has been widely studied in centralized settings, it has been relatively less explored in federated or distributed settings. 2. **Communication Efficiency Problem**: Existing federated learning algorithms usually focus on single - objective optimization problems, and one of the main challenges faced by FMOO is the excessively high communication cost, especially when the number of clients and the number of objective functions increase. 3. **Data Heterogeneity**: The data distributions of different clients may vary greatly, which increases the difficulty of learning. 4. **Negative Transfer**: In multi - task learning, optimizing one objective may have a negative impact on the performance of other tasks. #### Contributions of the Paper To solve the above problems, this paper makes the following contributions: 1. **FedCMOO Algorithm**: - Proposes a communication - efficient federated multi - objective optimization algorithm FedCMOO. This algorithm aggregates the gradients of each client into a single gradient vector through Randomized SVD (Singular Value Decomposition), thus avoiding the problem that the communication cost grows linearly with the number of objectives. - Theoretically proves the convergence of FedCMOO under smooth non - convex objective functions, and its sample complexity has a better dependence on the number of objectives. 2. **Preference - Driven FedCMOO - Pref**: - Proposes FedCMOO - Pref, which is the first algorithm that can optimize according to the proportion of user - specified objective function values in the federated environment. Users can adjust the final optimization results by setting the weights between different objectives. 3. **Experimental Verification**: - Through extensive experiments, shows the superiority of the proposed methods in performance and efficiency, especially when dealing with large - scale data sets and multiple objectives. ### Markdown Representation of Formulas 1. **Multi - Objective Optimization Problem**: \[ \min_{x \in \mathbb{R}^d} F(x) := [ F_1(x), F_2(x), \ldots, F_M(x)]^\top \] where \( M \) is the number of objectives, \( F \in \mathbb{R}^M \) is \( M \) individual objective loss functions \( \{F_i\}_{i = 1}^M \), \( x \in \mathbb{R}^d \) is the common model parameter. 2. **Pareto Optimal Solution**: - A solution \( x^* \) is called a Pareto optimal solution if there does not exist any other \( x \in \mathbb{R}^d \) such that \( F_k(x)\leq F_k(x^*) \) for all \( k\in [M] \) and there exists some \( k'\in [M] \) such that \( F_{k'}(x)< F_{k'}(x^*) \). 3. **Multi - Gradient Descent Algorithm (MGDA)**: - The descent direction \( d^* \) is defined as: \[ d^*=\arg \max_{d \in \mathbb{R}^d} \min_{k \in [M]} F_k(x)-F_k(x - \eta d) \] 4. **Gram Matrix Approximation**: - At the beginning of each round, the server selects a set of clients and sends the current model \( x^{(t)} \), and then the clients calculate the random Jacobian matrix and send it to the server.