Abstract:The deployment of federated learning (FL) within vertical heterogeneous networks, such as those enabled by high-altitude platform station (HAPS), offers the opportunity to engage a wide array of clients, each endowed with distinct communication and computational capabilities. This diversity not only enhances the training accuracy of FL models but also hastens their convergence. Yet, applying FL in these expansive networks presents notable challenges, particularly the significant non-IIDness in client data distributions. Such data heterogeneity often results in slower convergence rates and reduced effectiveness in model training performance. Our study introduces a client selection strategy tailored to address this issue, leveraging user network traffic behaviour. This strategy involves the prediction and classification of clients based on their network usage patterns while prioritizing user privacy. By strategically selecting clients whose data exhibit similar patterns for participation in FL training, our approach fosters a more uniform and representative data distribution across the network. Our simulations demonstrate that this targeted client selection methodology significantly reduces the training loss of FL models in HAPS networks, thereby effectively tackling a crucial challenge in implementing large-scale FL systems.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in the federated learning (FL) network supported by high - altitude platform stations (HAPS), the problem of slow convergence speed and poor performance of model training caused by the non - independent and identically distributed (non - IIDness) characteristics of client - side data distribution. Specifically: 1. **Data heterogeneity problem**: - In the vertical heterogeneous network supported by HAPS, the data distribution of each client may be very different. This data heterogeneity will lead to a slower convergence speed of the federated learning model and reduce the effectiveness of model training. - Expressed by the formula: \[ \text{Non - IID data} \implies \text{Slow convergence speed}, \text{Poor model training performance} \] 2. **Client selection strategy**: - To meet this challenge, the paper proposes a client - side selection strategy based on user network traffic behavior. By predicting and classifying clients, clients with similar data distributions are preferentially selected to participate in federated learning training according to their network usage patterns. - Specific steps include: - **Feature extraction**: Extract features from historical network traffic data, such as daily usage peaks, preferred applications, etc. - **Classification and clustering**: Use these features to classify and cluster clients to ensure that the data distributions of participating clients are more consistent. 3. **Improved objectives**: - **Improve convergence efficiency**: By selecting clients with similar data distributions, the global model update is more consistent, thereby accelerating the convergence speed. - **Reduce communication overhead**: Only a part of similar clients are selected in each training round, reducing the amount of data exchange between the server and clients. - **Enhance privacy protection**: By clustering similar clients, individual client - side data is better protected during the aggregation process. - **Reduce training loss**: By reducing the impact of non - IID data and improving the consistency of local model updates, the training loss of the global model is reduced. 4. **Experimental verification**: - The paper verifies the effectiveness of this strategy through simulation. The results show that in the HAPS network, this strategy significantly reduces the training loss of the federated learning model and improves the performance of the model. In summary, the main objective of this paper is to effectively solve the challenges brought by data heterogeneity in the federated learning network supported by HAPS through an intelligent client - side selection strategy, thereby improving the overall performance of the federated learning system.

Strategic Client Selection to Address Non-IIDness in HAPS-enabled FL Networks

Client Selection for Federated Learning With Non-IID Data in Mobile Edge Computing

FedAHP: A Heterogeneous Client Selection Method for Federated Learning Based on the Analytic Hierarchy Process in Mobile Edge

Heterogeneity-Guided Client Sampling: Towards Fast and Efficient Non-IID Federated Learning

Joint User Association and Resource Allocation for Wireless Hierarchical Federated Learning with IID and Non-IID Data

DPP-based Client Selection for Federated Learning with Non-IID Data

Hierarchical Federated Learning with Adaptive Clustering on Non-IID Data

Client Selection in Federated Learning: Principles, Challenges, and Opportunities

Secure Hierarchical Federated Learning in Vehicular Networks Using Dynamic Client Selection and Anomaly Detection

Context-Aware Online Client Selection for Hierarchical Federated Learning

An EMD-Based Adaptive Client Selection Algorithm for Federated Learning in Heterogeneous Data Scenarios

Privacy-preserving Data Selection for Horizontal and Vertical Federated Learning

Adaptive client selection with personalization for communication efficient Federated Learning

Pretraining Client Selection Algorithm Based on a Data Distribution Evaluation Model in Federated Learning

A Review of Client Selection Methods in Federated Learning

FedSTS: A Stratified Client Selection Framework for Consistently Fast Federated Learning

Smart client selection strategies for enhanced federated learning in digital healthcare applications

Risk-Aware Accelerated Wireless Federated Learning with Heterogeneous Clients

FedMint: Intelligent Bilateral Client Selection in Federated Learning with Newcomer IoT Devices

Adaptive Idle Model Fusion in Hierarchical Federated Learning for Unbalanced Edge Regions