Abstract:Federated learning (FL) is an emerging machine learning (ML) training paradigm where clients own their data and collaborate to train a global model, without revealing any data to the server and other participants. Researchers commonly perform experiments in a simulation environment to quickly iterate on ideas. However, existing open-source tools do not offer the efficiency required to simulate FL on larger and more realistic FL datasets. We introduce pfl-research, a fast, modular, and easy-to-use Python framework for simulating FL. It supports TensorFlow, PyTorch, and non-neural network models, and is tightly integrated with state-of-the-art privacy algorithms. We study the speed of open-source FL frameworks and show that pfl-research is 7-72$\times$ faster than alternative open-source frameworks on common cross-device setups. Such speedup will significantly boost the productivity of the FL research community and enable testing hypotheses on realistic FL datasets that were previously too resource intensive. We release a suite of benchmarks that evaluates an algorithm's overall performance on a diverse set of realistic scenarios. The code is available on GitHub at

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: Existing open - source tools are inefficient in simulating Federated Learning (FL) and cannot efficiently handle larger - scale and more realistic FL datasets. Specifically: 1. **Efficiency problems of existing tools**: - Existing open - source frameworks are slow when simulating FL, making it difficult for researchers to conduct experiments on large - scale, realistic datasets. - This inefficiency limits the productivity of researchers, making hypothesis testing resource - intensive and time - consuming. 2. **The gap between research needs and practical applications**: - Due to privacy, bandwidth or other compliance issues, it is difficult to obtain a real FL deployment environment, and most researchers cannot directly evaluate algorithms in a real environment. - Even researchers with access rights are limited by user experience constraints, so efficient simulation tools are needed to accelerate the research process. 3. **Lack of integrated privacy - protection mechanisms**: - Existing FL frameworks usually do not integrate advanced privacy - protection technologies, such as Differential Privacy (DP) and Secure Aggregation, which limits the research on Private Federated Learning (PFL). 4. **The need to support multiple models and frameworks**: - Existing FL frameworks usually only support specific deep - learning frameworks (such as TensorFlow or PyTorch) and have limited support for non - neural - network models. To solve these problems, the paper introduces `pfl - research`, which is a fast, modular and easy - to - use Python framework specifically designed for simulating FL and PFL training. The main contributions of `pfl - research` include: - **Speed improvement**: Compared with other open - source frameworks, `pfl - research` has a speed improvement of 7 to 72 times in common cross - device settings. - **Ease of use in distributed simulation**: It provides a seamless transition from single - process to distributed simulation and simplifies debugging, testing and performance analysis. - **Privacy - protection integration**: It closely integrates the state - of - the - art privacy - protection mechanisms, facilitating researchers to experiment with PFL. - **Support for multiple models and frameworks**: It not only supports TensorFlow and PyTorch, but also supports non - neural - network models, such as Federated Gradient Boosting Decision Tree (GBDT) and Federated Gaussian Mixture Model (GMM). - **Diverse benchmark tests**: It provides a series of benchmark tests covering different fields, IID/non - IID distributions, no - DP/centralized - DP and other scenarios, ensuring the comprehensiveness of algorithm evaluation. Through these improvements, `pfl - research` significantly improves the productivity of the FL research community, enabling researchers to verify hypotheses on larger - scale and more realistic datasets.

pfl-research: simulation framework for accelerating research in Private Federated Learning

Decentral and Incentivized Federated Learning Frameworks: A Systematic Literature Review

FLUTE: A Scalable, Extensible Framework for High-Performance Federated Learning Simulations

FLGo: A Fully Customizable Federated Learning Platform

OpenFL: An open-source framework for Federated Learning

PrivacyFL: A simulator for privacy-preserving and secure federated learning

Secure and Efficient Decentralized Federated Learning with Data Representation Protection

Efficient, Private and Robust Federated Learning

Advances in APPFL: A Comprehensive and Extensible Federated Learning Framework

NVIDIA FLARE: Federated Learning from Simulation to Real-World

Flower: A Friendly Federated Learning Research Framework

XFL: A High Performace, Lightweighted Federated Learning Framework

APPFLx: Providing Privacy-Preserving Cross-Silo Federated Learning as a Service

Federated Learning in Practice: Reflections and Projections

pFL-Bench: A Comprehensive Benchmark for Personalized Federated Learning

PFLlib: Personalized Federated Learning Algorithm Library

FedML: A Research Library and Benchmark for Federated Machine Learning

Privacy-Preserving Federated Learning Framework Based on Chained Secure Multiparty Computing

Enabling End-to-End Secure Federated Learning in Biomedical Research on Heterogeneous Computing Environments with APPFLx

PyramidFL

Personalized Federated Learning Techniques: Empirical Analysis