Abstract:Federated Learning (FL) has been widely accepted as the solution for privacy-preserving machine learning without collecting raw data. While new technologies proposed in the past few years do evolve the FL area, unfortunately, the evaluation results presented in these works fall short in integrity and are hardly comparable because of the inconsistent evaluation metrics and experimental settings. In this paper, we propose a holistic evaluation framework for FL called FedEval, and present a benchmarking study on seven state-of-the-art FL algorithms. Specifically, we first introduce the core evaluation taxonomy model, called FedEval-Core, which covers four essential evaluation aspects for FL: Privacy, Robustness, Effectiveness, and Efficiency, with various well-defined metrics and experimental settings. Based on the FedEval-Core, we further develop an FL evaluation platform with standardized evaluation settings and easy-to-use interfaces. We then provide an in-depth benchmarking study between the seven well-known FL algorithms, including FedSGD, FedAvg, FedProx, FedOpt, FedSTC, SecAgg, and HEAgg. We comprehensively analyze the advantages and disadvantages of these algorithms and further identify the suitable practical scenarios for different algorithms, which is rarely done by prior work. Lastly, we excavate a set of take-away insights and future research directions, which are very helpful for researchers in the FL area.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are as follows: Although existing Federated Learning (FL) techniques have made progress in privacy protection, they are deficient in evaluation methods. The specific manifestations are: 1. **Lack of Completeness in Evaluation Results**: Different research works focus on different evaluation aspects, resulting in evaluation results that are difficult to fully reflect the overall performance of the algorithm. 2. **Incomparability of Evaluation Results**: Due to inconsistent evaluation metrics and experimental settings, it is difficult to directly compare the evaluation results of different studies. To solve these problems, the author proposes a comprehensive evaluation framework - FedEval, aiming to provide a standardized, comparable, and comprehensive evaluation tool for Federated Learning. Through this framework, the advantages and disadvantages of existing Federated Learning algorithms can be more accurately evaluated, and guidance can be provided for future research. ### Main Problem Summary - **Insufficient Evaluation Completeness**: Existing research usually only evaluates specific aspects of its improvement while ignoring other important metrics. - **Incomparability of Evaluation Results**: Different evaluation metrics and experimental settings make it difficult to directly compare the results of different studies. - **Lack of a Standardized Evaluation Platform**: Researchers need to manually implement evaluation metrics and collect results, which increases the workload and is prone to introducing bias. ### Solutions The solutions proposed by the author include: 1. **Design a Comprehensive Evaluation Framework (FedEval - Core)**: - Propose a core model covering four key evaluation aspects: Privacy, Robustness, Effectiveness, and Efficiency. - Define detailed evaluation methods and metrics for each aspect. 2. **Develop a Standardized Evaluation Platform**: - Implement a lightweight and easy - to - use evaluation platform that provides standardized evaluation settings and a user - friendly interface. - Through this platform, users can easily evaluate new Federated Learning algorithms or test new attack and defense methods. 3. **Benchmarking Research**: - Conduct in - depth benchmarking of seven state - of - the - art Federated Learning algorithms, analyze their strengths and weaknesses, and identify algorithms suitable for different application scenarios. Through these measures, FedEval can not only provide more comprehensive and comparable evaluation results but also provide valuable insights and future research directions for the development of the Federated Learning field.

FedEval: A Holistic Evaluation Framework for Federated Learning

A Survey for Federated Learning Evaluations: Goals and Measures

FedPSE: Personalized Sparsification with Element-wise Aggregation for Federated Learning

FedDGP: Disentangling Global and Personal Models for Federated Learning

A Survey of Federated Evaluation in Federated Learning

Not All Federated Learning Algorithms Are Created Equal: A Performance Evaluation Study

HQsFL: A Novel Training Strategy for Constructing High-performance and Quantum-safe Federated Learning

Holistic Evaluation Metrics: Use Case Sensitive Evaluation Metrics for Federated Learning

Advancements in Federated Learning: Models, Methods, and Privacy

A Generalized Look at Federated Learning: Survey and Perspectives

FedEval-LLM: Federated Evaluation of Large Language Models on Downstream Tasks with Collective Wisdom

Contribution Evaluation in Federated Learning: Examining Current Approaches

Federated Learning in Practice: Reflections and Projections

A Framework for testing Federated Learning algorithms using an edge-like environment

FedML: A Research Library and Benchmark for Federated Machine Learning

Advances in APPFL: A Comprehensive and Extensible Federated Learning Framework

A Survey on Contribution Evaluation in Vertical Federated Learning

FedCV: A Federated Learning Framework for Diverse Computer Vision Tasks

EPFed: Achieving Optimal Balance between Privacy and Efficiency in Federated Learning

DeFTA: A Plug-and-Play Peer-to-Peer Decentralized Federated Learning Framework

A Fair Federated Learning Framework with Reinforcement Learning.