FedEval: A Holistic Evaluation Framework for Federated Learning

Di Chai,Leye Wang,Liu Yang,Junxue Zhang,Kai Chen,Qiang Yang
DOI: https://doi.org/10.48550/arXiv.2011.09655
2022-12-25
Abstract:Federated Learning (FL) has been widely accepted as the solution for privacy-preserving machine learning without collecting raw data. While new technologies proposed in the past few years do evolve the FL area, unfortunately, the evaluation results presented in these works fall short in integrity and are hardly comparable because of the inconsistent evaluation metrics and experimental settings. In this paper, we propose a holistic evaluation framework for FL called FedEval, and present a benchmarking study on seven state-of-the-art FL algorithms. Specifically, we first introduce the core evaluation taxonomy model, called FedEval-Core, which covers four essential evaluation aspects for FL: Privacy, Robustness, Effectiveness, and Efficiency, with various well-defined metrics and experimental settings. Based on the FedEval-Core, we further develop an FL evaluation platform with standardized evaluation settings and easy-to-use interfaces. We then provide an in-depth benchmarking study between the seven well-known FL algorithms, including FedSGD, FedAvg, FedProx, FedOpt, FedSTC, SecAgg, and HEAgg. We comprehensively analyze the advantages and disadvantages of these algorithms and further identify the suitable practical scenarios for different algorithms, which is rarely done by prior work. Lastly, we excavate a set of take-away insights and future research directions, which are very helpful for researchers in the FL area.
Machine Learning,Cryptography and Security,Distributed, Parallel, and Cluster Computing,Performance
What problem does this paper attempt to address?
The problems that this paper attempts to solve are as follows: Although existing Federated Learning (FL) techniques have made progress in privacy protection, they are deficient in evaluation methods. The specific manifestations are: 1. **Lack of Completeness in Evaluation Results**: Different research works focus on different evaluation aspects, resulting in evaluation results that are difficult to fully reflect the overall performance of the algorithm. 2. **Incomparability of Evaluation Results**: Due to inconsistent evaluation metrics and experimental settings, it is difficult to directly compare the evaluation results of different studies. To solve these problems, the author proposes a comprehensive evaluation framework - FedEval, aiming to provide a standardized, comparable, and comprehensive evaluation tool for Federated Learning. Through this framework, the advantages and disadvantages of existing Federated Learning algorithms can be more accurately evaluated, and guidance can be provided for future research. ### Main Problem Summary - **Insufficient Evaluation Completeness**: Existing research usually only evaluates specific aspects of its improvement while ignoring other important metrics. - **Incomparability of Evaluation Results**: Different evaluation metrics and experimental settings make it difficult to directly compare the results of different studies. - **Lack of a Standardized Evaluation Platform**: Researchers need to manually implement evaluation metrics and collect results, which increases the workload and is prone to introducing bias. ### Solutions The solutions proposed by the author include: 1. **Design a Comprehensive Evaluation Framework (FedEval - Core)**: - Propose a core model covering four key evaluation aspects: Privacy, Robustness, Effectiveness, and Efficiency. - Define detailed evaluation methods and metrics for each aspect. 2. **Develop a Standardized Evaluation Platform**: - Implement a lightweight and easy - to - use evaluation platform that provides standardized evaluation settings and a user - friendly interface. - Through this platform, users can easily evaluate new Federated Learning algorithms or test new attack and defense methods. 3. **Benchmarking Research**: - Conduct in - depth benchmarking of seven state - of - the - art Federated Learning algorithms, analyze their strengths and weaknesses, and identify algorithms suitable for different application scenarios. Through these measures, FedEval can not only provide more comprehensive and comparable evaluation results but also provide valuable insights and future research directions for the development of the Federated Learning field.