Strategic Client Selection to Address Non-IIDness in HAPS-enabled FL Networks

Amin Farajzadeh,Animesh Yadav,Halim Yanikomeroglu
2024-01-11
Abstract:The deployment of federated learning (FL) within vertical heterogeneous networks, such as those enabled by high-altitude platform station (HAPS), offers the opportunity to engage a wide array of clients, each endowed with distinct communication and computational capabilities. This diversity not only enhances the training accuracy of FL models but also hastens their convergence. Yet, applying FL in these expansive networks presents notable challenges, particularly the significant non-IIDness in client data distributions. Such data heterogeneity often results in slower convergence rates and reduced effectiveness in model training performance. Our study introduces a client selection strategy tailored to address this issue, leveraging user network traffic behaviour. This strategy involves the prediction and classification of clients based on their network usage patterns while prioritizing user privacy. By strategically selecting clients whose data exhibit similar patterns for participation in FL training, our approach fosters a more uniform and representative data distribution across the network. Our simulations demonstrate that this targeted client selection methodology significantly reduces the training loss of FL models in HAPS networks, thereby effectively tackling a crucial challenge in implementing large-scale FL systems.
Networking and Internet Architecture,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in the federated learning (FL) network supported by high - altitude platform stations (HAPS), the problem of slow convergence speed and poor performance of model training caused by the non - independent and identically distributed (non - IIDness) characteristics of client - side data distribution. Specifically: 1. **Data heterogeneity problem**: - In the vertical heterogeneous network supported by HAPS, the data distribution of each client may be very different. This data heterogeneity will lead to a slower convergence speed of the federated learning model and reduce the effectiveness of model training. - Expressed by the formula: \[ \text{Non - IID data} \implies \text{Slow convergence speed}, \text{Poor model training performance} \] 2. **Client selection strategy**: - To meet this challenge, the paper proposes a client - side selection strategy based on user network traffic behavior. By predicting and classifying clients, clients with similar data distributions are preferentially selected to participate in federated learning training according to their network usage patterns. - Specific steps include: - **Feature extraction**: Extract features from historical network traffic data, such as daily usage peaks, preferred applications, etc. - **Classification and clustering**: Use these features to classify and cluster clients to ensure that the data distributions of participating clients are more consistent. 3. **Improved objectives**: - **Improve convergence efficiency**: By selecting clients with similar data distributions, the global model update is more consistent, thereby accelerating the convergence speed. - **Reduce communication overhead**: Only a part of similar clients are selected in each training round, reducing the amount of data exchange between the server and clients. - **Enhance privacy protection**: By clustering similar clients, individual client - side data is better protected during the aggregation process. - **Reduce training loss**: By reducing the impact of non - IID data and improving the consistency of local model updates, the training loss of the global model is reduced. 4. **Experimental verification**: - The paper verifies the effectiveness of this strategy through simulation. The results show that in the HAPS network, this strategy significantly reduces the training loss of the federated learning model and improves the performance of the model. In summary, the main objective of this paper is to effectively solve the challenges brought by data heterogeneity in the federated learning network supported by HAPS through an intelligent client - side selection strategy, thereby improving the overall performance of the federated learning system.