Submodular Maximization Approaches for Equitable Client Selection in Federated Learning

Andrés Catalino Castillo Jiménez,Ege C. Kaya,Lintao Ye,Abolfazl Hashemi
2024-08-28
Abstract:In a conventional Federated Learning framework, client selection for training typically involves the random sampling of a subset of clients in each iteration. However, this random selection often leads to disparate performance among clients, raising concerns regarding fairness, particularly in applications where equitable outcomes are crucial, such as in medical or financial machine learning tasks. This disparity typically becomes more pronounced with the advent of performance-centric client sampling techniques. This paper introduces two novel methods, namely SUBTRUNC and UNIONFL, designed to address the limitations of random client selection. Both approaches utilize submodular function maximization to achieve more balanced models. By modifying the facility location problem, they aim to mitigate the fairness concerns associated with random selection. SUBTRUNC leverages client loss information to diversify solutions, while UNIONFL relies on historical client selection data to ensure a more equitable performance of the final model. Moreover, these algorithms are accompanied by robust theoretical guarantees regarding convergence under reasonable assumptions. The efficacy of these methods is demonstrated through extensive evaluations across heterogeneous scenarios, revealing significant improvements in fairness as measured by a client dissimilarity metric.
Machine Learning,Artificial Intelligence,Signal Processing,Systems and Control
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the fairness problem of client selection in Federated Learning (FL). In the traditional FL framework, client selection is usually done by random sampling, which may lead to significant performance differences among different clients, especially in cases of high data heterogeneity (such as machine - learning tasks in the medical or financial fields). This difference not only affects the convergence of the model but also raises concerns about fairness. Specifically, the paper proposes the following two problems: 1. **Performance imbalance caused by random selection**: - In traditional methods, randomly selecting clients for training will cause the data of some clients to contribute more to the model update, while the data of other clients contribute less, resulting in an imbalance in model performance. - This imbalance is particularly evident in scenarios with large differences in data distribution. For example, MRI data in medical imaging may come from devices of different manufacturers, resulting in high data heterogeneity. 2. **Existing performance - oriented selection methods exacerbate unfairness**: - Existing performance - oriented client selection methods (such as selection based on gradient similarity) can improve the overall performance of the model, but often ignore fairness, making some clients always be selected while other clients are rarely involved in training. To solve these problems, the paper proposes two new methods: **S UBTRUNC** and **U NION FL**. These two methods achieve more balanced client selection by maximizing submodular functions, thereby improving the fairness and overall performance of the model. ### Method overview - **S UBTRUNC**: By introducing the truncated submodular function as a regularization term and combining client loss information, it ensures that the model performance is more balanced among all clients. - **U NION FL**: By recording historical client selection data and encouraging those previously unselected clients to participate in training, it promotes the diversity of client selection, thereby improving the fairness of the model. These two methods not only have strong convergence guarantees in theory but also show significant fairness improvements in practical applications.