LEAP: Optimization Hierarchical Federated Learning on Non-IID Data with Coalition Formation Game

Jianfeng Lu,Yue Chen,Shuqin Cao,Longbiao Chen,Wei Wang,Yun Xin
2024-05-01
Abstract:Although Hierarchical Federated Learning (HFL) utilizes edge servers (ESs) to alleviate communication burdens, its model performance will be degraded by non-IID data and limited communication resources. Current works often assume that data is uniformly distributed, which however contradicts the heterogeneity of IoT. Solutions of additional model training to check the data distribution inevitably increases computational costs and the risk of privacy leakage. The challenges in solving these issues are how to reduce the impact of non-IID data without involving raw data and how to rationalize the communication resource allocation for addressing straggler problem. To tackle these challenges, we propose a novel optimization method based on coaLition formation gamE and grAdient Projection, called LEAP. Specifically, we combine edge data distribution with coalition formation game innovatively to adjust the correlations between clients and ESs dynamically, which ensures optimal correlations. We further capture the client heterogeneity to achieve the rational bandwidth allocation from coalition perception and determine the optimal transmission power within specified delay constraints at client level. Experimental results on four real datasets show that LEAP is able to achieve 20.62% improvement in model accuracy compared to the state-of-the-art baselines. Moreover, LEAP effectively reduce transmission energy consumption by at least about 2.24 times.
Computer Science and Game Theory
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to optimize the model performance of Hierarchical Federated Learning (HFL) in the case of non - independent and identically distributed (Non - IID) data and limited communication resources. Specifically, the paper mainly focuses on the following two aspects: 1. **Reducing the influence of non - independent and identically distributed (Non - IID) data**: - In federated learning, due to the differences in user behavior patterns and data collection methods, the data distribution of each client often varies greatly, resulting in the local data being unable to represent the overall data distribution, thus affecting the generalization ability and performance of the model. - The paper proposes a method based on coalition formation game to adjust the association relationship between clients and edge servers (ES) to optimize the data distribution and reduce the negative impact of non - IID data on model performance. 2. **Optimizing communication resource allocation and solving the straggler problem**: - In HFL, communication delay and instability are important factors affecting the efficiency of model training. Especially in synchronous federated learning, the slowest client will significantly affect the performance of the entire system. - The paper ensures the minimum transmission energy consumption while meeting the task delay requirements by dynamically adjusting the bandwidth allocation and determining the optimal transmission power, thereby improving the communication efficiency. To solve these problems, the author proposes a new method named LEAP (Optimization Hierarchical Federated Learning on Non - IID Data with Coalition Formation Game). LEAP combines the coalition formation game and the gradient projection method, aiming to achieve the following goals: - **Theoretically**: Research the influence of multi - dimensional attributes (time delay, energy consumption, data distribution) on HFL performance, transform the data distribution optimization problem into an edge association problem, and further optimize the heterogeneous resource allocation. - **Methodologically**: By analyzing the relationship between edge association and edge data distribution similarity, construct a coalition formation game and prove the existence of stable coalitions. Based on this, use the gradient projection method to calculate the optimal bandwidth allocation for each coalition to ensure that the task delay requirements are met. - **Experimentally**: Verify the effectiveness of LEAP on four real - data sets. The results show that LEAP can improve the model accuracy by 20.62% compared with the existing methods and reduce the transmission energy consumption by at least 2.4 times. In conclusion, this paper aims to simultaneously solve the data heterogeneity and communication bottleneck problems in HFL through innovative optimization methods, thereby improving the model performance and communication efficiency.