FedScore: A privacy-preserving framework for federated scoring system development

Siqi Li,Yilin Ning,Marcus Eng Hock Ong,Bibhas Chakraborty,Chuan Hong,Feng Xie,Han Yuan,Mingxuan Liu,Daniel M. Buckland,Yong Chen,Nan Liu
DOI: https://doi.org/10.1016/j.jbi.2023.104485
2023-03-01
Abstract:We propose FedScore, a privacy-preserving federated learning framework for scoring system generation across multiple sites to facilitate cross-institutional collaborations. The FedScore framework includes five modules: federated variable ranking, federated variable transformation, federated score derivation, federated model selection and federated model evaluation. To illustrate usage and assess FedScore's performance, we built a hypothetical global scoring system for mortality prediction within 30 days after a visit to an emergency department using 10 simulated sites divided from a tertiary hospital in Singapore. We employed a pre-existing score generator to construct 10 local scoring systems independently at each site and we also developed a scoring system using centralized data for comparison. We compared the acquired FedScore model's performance with that of other scoring models using the receiver operating characteristic (ROC) analysis. The FedScore model achieved an average area under the curve (AUC) value of 0.763 across all sites, with a standard deviation (SD) of 0.020. We also calculated the average AUC values and SDs for each local model, and the FedScore model showed promising accuracy and stability with a high average AUC value which was closest to the one of the pooled model and SD which was lower than that of most local models. This study demonstrates that FedScore is a privacy-preserving scoring system generator with potentially good generalizability.
Machine Learning,Artificial Intelligence,Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in cross - institutional collaboration, how to construct a privacy - protected federated scoring system (FedScore) without sharing patient - level data, in order to generate clinical scoring models with wide applicability and stability. Specifically, the paper aims to overcome the privacy regulation limitations and data silo problems faced by traditional centralized data sharing through the Federated Learning (FL) framework, thereby achieving efficient cooperation among multiple medical institutions and improving the generalization ability and performance stability of the scoring system. ### Main problem summary: 1. **Data privacy and compliance**: The traditional centralized data - sharing method can hardly meet the requirements of privacy regulations (such as the General Data Protection Regulation in the European Union), resulting in difficulties in cross - institutional collaboration. 2. **Generalization ability of the scoring system**: The scoring system constructed with single - source data may perform poorly in the applications of other medical institutions due to insufficient sample size or poor data representativeness. 3. **Stability of the scoring system**: The data distributions in different medical institutions may be different, resulting in unstable performance of local models. ### Solutions: - **FedScore framework**: The construction of a cross - institutional scoring system is achieved through five modules (federated variable ranking, federated variable transformation, federated score derivation, federated model selection, and federated model evaluation), ensuring the privacy protection, generalization ability, and stability of the model. - **Experimental verification**: The effectiveness of the FedScore framework has been verified through simulation experiments, especially its performance in the task of predicting the mortality rate of emergency department patients within 30 days, showing its potential in multi - site applications. ### Core contributions of the paper: - Proposed the first privacy - protected framework FedScore for constructing a federated scoring system. - Proved through experiments that FedScore can effectively deal with cross - institutional data privacy problems while maintaining high accuracy and low performance variation. - Emphasized that when applying federated learning in the medical field, special attention needs to be paid to the interpretability and transparency of the model to meet the needs of clinical practice. In conclusion, this paper provides an innovative and practical solution for solving data privacy problems in cross - institutional collaboration, which is helpful for promoting joint research in the medical field and the development of high - quality clinical decision - support systems.