FedDCL: a federated data collaboration learning as a hybrid-type privacy-preserving framework based on federated learning and data collaboration

Akira Imakura,Tetsuya Sakurai
2024-09-27
Abstract:Recently, federated learning has attracted much attention as a privacy-preserving integrated analysis that enables integrated analysis of data held by multiple institutions without sharing raw data. On the other hand, federated learning requires iterative communication across institutions and has a big challenge for implementation in situations where continuous communication with the outside world is extremely difficult. In this study, we propose a federated data collaboration learning (FedDCL), which solves such communication issues by combining federated learning with recently proposed non-model share-type federated learning named as data collaboration analysis. In the proposed FedDCL framework, each user institution independently constructs dimensionality-reduced intermediate representations and shares them with neighboring institutions on intra-group DC servers. On each intra-group DC server, intermediate representations are transformed to incorporable forms called collaboration representations. Federated learning is then conducted between intra-group DC servers. The proposed FedDCL framework does not require iterative communication by user institutions and can be implemented in situations where continuous communication with the outside world is extremely difficult. The experimental results show that the performance of the proposed FedDCL is comparable to that of existing federated learning.
Machine Learning,Cryptography and Security
What problem does this paper attempt to address?
This paper attempts to solve the problem of privacy - protected data integration analysis among multiple institutions or countries, especially when these institutions are unable to maintain continuous communication with the outside world. Specifically, the paper focuses on how to achieve this kind of analysis in the field of medical data, especially for the research of rare diseases, because the amount of data in a single institution may not be sufficient to provide sufficient analysis accuracy. Therefore, the paper proposes a new framework - Federated Data Collaborative Learning (FedDCL), which aims to overcome the challenge of frequent cross - institution communication in existing methods by combining federated learning and data collaborative analysis. FedDCL allows each institution to independently construct intermediate representations with reduced dimensions and share these representations only within the group, thereby reducing the dependence on external communication while ensuring the privacy and security of data. ### Main Problems and Solutions 1. **Problem**: When conducting privacy - protected data integration analysis among multiple institutions or countries, existing methods (such as federated learning) are difficult to implement due to data privacy and communication limitations. 2. **Solution**: The paper proposes the FedDCL framework to solve the above problems through the following steps: - **Intermediate Representation with Reduced Dimensions**: Each user institution independently constructs an intermediate representation of data with reduced dimensions and shares these representations on the DC server within the group. - **Collaborative Representation**: On the DC server within each group, the intermediate representation is transformed into a form that can be merged, called collaborative representation. - **Federated Learning**: Federated learning is carried out on the DC server between groups to generate an integrated model. - **Model Return**: The final integrated model and transformation matrix are returned to each user institution for the prediction of the original data. ### Experimental Verification The paper verifies the effectiveness and performance of FedDCL through multiple experiments: - **Proof - of - Concept Experiment**: Using the BatterySmall dataset, it shows the convergence and performance of FedDCL in regression tasks, proving that it maintains high analysis performance while reducing the number of communications. - **Multi - Dataset Prediction Performance Evaluation**: Prediction performance evaluations are carried out on six different datasets, including BatterySmall, CreditRating Historical, eICU, HumanActivity, etc., further verifying the applicability and advantages of FedDCL in different tasks. ### Conclusion The FedDCL framework proposed in the paper successfully solves the problem of privacy - protected data integration analysis in cases where continuous communication is not possible. The experimental results show that FedDCL is not only comparable in performance to existing federated learning methods, but also has obvious advantages in reducing communication requirements and enhancing data privacy protection. This framework provides a new solution for privacy - protected analysis of medical data, especially for joint research on rare diseases.