Partial Federated Learning

Tiantian Feng,Anil Ramakrishna,Jimit Majmudar,Charith Peris,Jixuan Wang,Clement Chung,Richard Zemel,Morteza Ziyadi,Rahul Gupta
2024-03-04
Abstract:Federated Learning (FL) is a popular algorithm to train machine learning models on user data constrained to edge devices (for example, mobile phones) due to privacy concerns. Typically, FL is trained with the assumption that no part of the user data can be egressed from the edge. However, in many production settings, specific data-modalities/meta-data are limited to be on device while others are not. For example, in commercial SLU systems, it is typically desired to prevent transmission of biometric signals (such as audio recordings of the input prompt) to the cloud, but egress of locally (i.e. on the edge device) transcribed text to the cloud may be possible. In this work, we propose a new algorithm called Partial Federated Learning (PartialFL), where a machine learning model is trained using data where a subset of data modalities or their intermediate representations can be made available to the server. We further restrict our model training by preventing the egress of data labels to the cloud for better privacy, and instead use a contrastive learning based model objective. We evaluate our approach on two different multi-modal datasets and show promising results with our proposed approach.
Machine Learning,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The paper attempts to address the issue of how to handle partially shared data modalities in Federated Learning (FL) to improve model performance while ensuring privacy. Specifically: 1. **Data Modality Heterogeneity**: In existing federated learning frameworks, it is usually assumed that all data modalities have the same constraints, i.e., either all data can be centrally trained, or all data must remain on local devices for federated training. However, in practical applications, certain data modalities (such as text) may be allowed to be uploaded to a central server, while other modalities (such as audio or biometric signals) need to be strictly protected. 2. **Limitations of Existing Methods**: Traditional federated learning methods perform poorly when faced with data heterogeneity and limited computational capacity of edge devices; while split learning can handle multi-modal data, it incurs significant communication overhead. Therefore, researchers have proposed Partial Federated Learning (PartialFL), which aims to leverage partially shareable data modalities to enhance model performance while avoiding the leakage of sensitive information to the central server. Through the aforementioned improvements, PartialFL is able to further enhance model performance on multi-modal datasets while retaining the advantages of federated learning, and also reduce communication costs and privacy leakage risks. Experimental results show that this method achieves better results than traditional federated learning schemes on multiple datasets, such as in emotion recognition tasks.