Abstract:Federated learning (FL) has shown promising potential in safeguarding data privacy in healthcare collaborations. While the term "FL" was originally coined by the engineering community, the statistical field has also explored similar privacy-preserving algorithms. Statistical FL algorithms, however, remain considerably less recognized than their engineering counterparts. Our goal was to bridge the gap by presenting the first comprehensive comparison of FL frameworks from both engineering and statistical domains. We evaluated five FL frameworks using both simulated and real-world data. The results indicate that statistical FL algorithms yield less biased point estimates for model coefficients and offer convenient confidence interval estimations. In contrast, engineering-based methods tend to generate more accurate predictions, sometimes surpassing central pooled and statistical FL models. This study underscores the relative strengths and weaknesses of both types of methods, emphasizing the need for increased awareness and their integration in future FL applications.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is: **Compare and evaluate the performance differences of Federated Learning (FL) frameworks in engineering and statistics in clinical structured data**. Specifically, the research aims to fill the following two gaps: 1. **Comparison between engineering and statistical methods**: Although the concept of federated learning was initially proposed by the engineering community and has performed well in prediction tasks, the field of statistics has also developed similar privacy - protection algorithms, which have unique advantages in point estimation and confidence interval estimation. However, the application of statistical methods in the medical field has not received sufficient attention. Therefore, by comparing different types of FL frameworks, this paper hopes to provide a selection guide for future research. 2. **Importance of non - prediction tasks**: In addition to prediction tasks, the medical field also needs to accurately estimate the association between factors and clinical outcomes (i.e., point estimation), which is crucial for formulating intervention measures and resource allocation strategies. However, most of the existing research mainly focuses on prediction performance and ignores the needs of non - prediction tasks. This paper emphasizes the differences between these two types of tasks and their impact on the selection of FL frameworks. ### Research Objectives - **Evaluate five FL frameworks**: including GLORE, FedAvg, FedAvgM, 𝑞 - FedAvg and FedProx. - **Use simulated and real - world data**: Conduct controlled experiments with simulated data to verify the bias of model parameter estimation and the accuracy of confidence intervals; use real - world electronic health record (EHR) data to evaluate prediction performance. - **Provide practical suggestions**: Based on the experimental results, provide guidance for future researchers on how to select FL frameworks suitable for different types of clinical tasks. ### Main Findings - **Advantages of statistical methods**: In terms of point estimation and confidence interval estimation, statistical methods (such as GLORE) show smaller biases and higher confidence levels. - **Advantages of engineering methods**: In prediction tasks, engineering methods (such as 𝑞 - FedAvg) can sometimes achieve higher prediction accuracy than centralized models. - **Communication cost**: GLORE performs better in communication efficiency and usually requires fewer communication rounds to converge. ### Conclusion Through systematic benchmarking, this paper reveals the respective advantages and disadvantages of engineering and statistical FL frameworks and provides valuable references for future clinical research. In particular, for research involving non - prediction tasks, it is recommended to give priority to statistical methods; for prediction tasks, engineering methods can be considered to obtain better prediction performance. At the same time, the research also proposes the possibility of integrating the two methods in order to further improve the application effect of FL in the future.

Federated Learning for Clinical Structured Data: A Benchmark Comparison of Engineering and Statistical Approaches

A Federated Learning Framework Via Decentralized Data Valuation for Chronic Disease Healthcare

Federated Learning on Clinical Benchmark Data: Performance Assessment

Federated and distributed learning applications for electronic health records and structured medical data: A scoping review

Towards Fair and Privacy Preserving Federated Learning for the Healthcare Domain

Contribution-Aware Federated Learning for Smart Healthcare

Federated Learning for Healthcare: Systematic Review and Architecture Proposal

A Comprehensive View of Personalized Federated Learning on Heterogeneous Clinical Datasets

Unified Fair Federated Learning for Digital Healthcare

Privacy preservation for federated learning in health care

Federated learning based futuristic biomedical big-data analysis and standardization

Federated Learning in Healthcare: Model Misconducts, Security, Challenges, Applications, and Future Research Directions -- A Systematic Review

An adaptive federated learning framework for clinical risk prediction with electronic health records from multiple hospitals

On the Impact of Data Heterogeneity in Federated Learning Environments with Application to Healthcare Networks

A Review of Privacy Enhancement Methods for Federated Learning in Healthcare Systems

Enhancing Privacy in Federated Learning: Secure Aggregation for Real-World Healthcare Applications

Analyzing the Impact of Personalization on Fairness in Federated Learning for Healthcare

Federated learning for preserving data privacy in collaborative healthcare research

Atherosclerotic disease in axial spondyloarthritis: increased frequency of carotid plaques.

A Fog-Based Privacy-Preserving Federated Learning System for Smart Healthcare Applications

Federated Learning in Multi-Center Critical Care Research: A Systematic Case Study using the eICU Database