Pollen: High-throughput Federated Learning Simulation via Resource-Aware Client Placement

Lorenzo Sani,Pedro Porto Buarque de Gusmão,Alex Iacob,Wanru Zhao,Xinchi Qiu,Yan Gao,Javier Fernandez-Marques,Nicholas Donald Lane
2024-05-20
Abstract:Federated Learning (FL) is a privacy-focused machine learning paradigm that collaboratively trains models directly on edge devices. Simulation plays an essential role in FL adoption, helping develop novel aggregation and client sampling strategies. However, current simulators cannot emulate large-scale systems in a time-efficient manner, which limits their utility and casts doubts on generalizability.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in large - scale federated learning (Federated Learning, FL) simulations, existing simulators are unable to efficiently handle the simulation of large - scale systems. This limits their practicality and calls into question the universality of the results. Specifically, the paper points out that current simulators have two main limiting factors: 1. **Low communication efficiency**: Due to the pull - based client execution method, the communication efficiency is not high. 2. **Insufficient load balancing**: When using heterogeneous hardware, effective load balancing cannot be achieved. These problems may have little impact in small - or medium - scale simulations, but in large - scale simulations, they will cause the experiment time to be too long, or even become impractical or infeasible. For example, some experiments may take weeks or months to complete. To solve these problems, the paper proposes a new resource - aware system - Pollen, which aims to accelerate large - scale FL simulations through the following methods: - **Adopting a push - based client placement system** to improve communication efficiency. - **Learning an adaptive client scheduling strategy** and scheduling according to hardware statistics. - **Estimating the optimal number of concurrent worker processes for each GPU** to make full use of GPU resources. The design and placement model of Pollen enables it to significantly reduce the execution time, from months to weeks, thus supporting practical research on large - scale federated learning systems without high hardware costs.