pfl-research: simulation framework for accelerating research in Private Federated Learning

Filip Granqvist,Congzheng Song,Áine Cahill,Rogier van Dalen,Martin Pelikan,Yi Sheng Chan,Xiaojun Feng,Natarajan Krishnaswami,Vojta Jina,Mona Chitnis
2024-04-10
Abstract:Federated learning (FL) is an emerging machine learning (ML) training paradigm where clients own their data and collaborate to train a global model, without revealing any data to the server and other participants. Researchers commonly perform experiments in a simulation environment to quickly iterate on ideas. However, existing open-source tools do not offer the efficiency required to simulate FL on larger and more realistic FL datasets. We introduce pfl-research, a fast, modular, and easy-to-use Python framework for simulating FL. It supports TensorFlow, PyTorch, and non-neural network models, and is tightly integrated with state-of-the-art privacy algorithms. We study the speed of open-source FL frameworks and show that pfl-research is 7-72$\times$ faster than alternative open-source frameworks on common cross-device setups. Such speedup will significantly boost the productivity of the FL research community and enable testing hypotheses on realistic FL datasets that were previously too resource intensive. We release a suite of benchmarks that evaluates an algorithm's overall performance on a diverse set of realistic scenarios. The code is available on GitHub at
Machine Learning,Artificial Intelligence,Cryptography and Security,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: Existing open - source tools are inefficient in simulating Federated Learning (FL) and cannot efficiently handle larger - scale and more realistic FL datasets. Specifically: 1. **Efficiency problems of existing tools**: - Existing open - source frameworks are slow when simulating FL, making it difficult for researchers to conduct experiments on large - scale, realistic datasets. - This inefficiency limits the productivity of researchers, making hypothesis testing resource - intensive and time - consuming. 2. **The gap between research needs and practical applications**: - Due to privacy, bandwidth or other compliance issues, it is difficult to obtain a real FL deployment environment, and most researchers cannot directly evaluate algorithms in a real environment. - Even researchers with access rights are limited by user experience constraints, so efficient simulation tools are needed to accelerate the research process. 3. **Lack of integrated privacy - protection mechanisms**: - Existing FL frameworks usually do not integrate advanced privacy - protection technologies, such as Differential Privacy (DP) and Secure Aggregation, which limits the research on Private Federated Learning (PFL). 4. **The need to support multiple models and frameworks**: - Existing FL frameworks usually only support specific deep - learning frameworks (such as TensorFlow or PyTorch) and have limited support for non - neural - network models. To solve these problems, the paper introduces `pfl - research`, which is a fast, modular and easy - to - use Python framework specifically designed for simulating FL and PFL training. The main contributions of `pfl - research` include: - **Speed improvement**: Compared with other open - source frameworks, `pfl - research` has a speed improvement of 7 to 72 times in common cross - device settings. - **Ease of use in distributed simulation**: It provides a seamless transition from single - process to distributed simulation and simplifies debugging, testing and performance analysis. - **Privacy - protection integration**: It closely integrates the state - of - the - art privacy - protection mechanisms, facilitating researchers to experiment with PFL. - **Support for multiple models and frameworks**: It not only supports TensorFlow and PyTorch, but also supports non - neural - network models, such as Federated Gradient Boosting Decision Tree (GBDT) and Federated Gaussian Mixture Model (GMM). - **Diverse benchmark tests**: It provides a series of benchmark tests covering different fields, IID/non - IID distributions, no - DP/centralized - DP and other scenarios, ensuring the comprehensiveness of algorithm evaluation. Through these improvements, `pfl - research` significantly improves the productivity of the FL research community, enabling researchers to verify hypotheses on larger - scale and more realistic datasets.