Federated Transfer Learning with Differential Privacy

Mengchu Li,Ye Tian,Yang Feng,Yi Yu
2024-04-09
Abstract:Federated learning is gaining increasing popularity, with data heterogeneity and privacy being two prominent challenges. In this paper, we address both issues within a federated transfer learning framework, aiming to enhance learning on a target data set by leveraging information from multiple heterogeneous source data sets while adhering to privacy constraints. We rigorously formulate the notion of \textit{federated differential privacy}, which offers privacy guarantees for each data set without assuming a trusted central server. Under this privacy constraint, we study three classical statistical problems, namely univariate mean estimation, low-dimensional linear regression, and high-dimensional linear regression. By investigating the minimax rates and identifying the costs of privacy for these problems, we show that federated differential privacy is an intermediate privacy model between the well-established local and central models of differential privacy. Our analyses incorporate data heterogeneity and privacy, highlighting the fundamental costs of both in federated learning and underscoring the benefit of knowledge transfer across data sets.
Machine Learning,Cryptography and Security,Statistics Theory,Methodology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is, in the federated learning framework, how to use the information of multiple heterogeneous source datasets to improve the learning performance on the target dataset while ensuring data privacy. Specifically, the paper focuses on how to combine Differential Privacy (DP) technology in the Federated Transfer Learning (FTL) framework to address the two main challenges of data heterogeneity and privacy protection. ### Core problems of the paper 1. **Data heterogeneity**: There may be significant differences between different datasets, which will affect the learning performance. The paper explores how to effectively identify and utilize source datasets similar to the target dataset to avoid the "negative transfer" phenomenon, that is, using irrelevant or dissimilar data will instead damage the learning effect. 2. **Privacy protection**: In federated learning, directly exchanging raw data will bring privacy risks. The paper proposes the concept of Federated Differential Privacy (FDP) to ensure that the privacy of each dataset is not violated without relying on a trusted central server. ### Specific research contents - **Definition of Federated Differential Privacy (FDP)**: The paper defines a new privacy constraint - Federated Differential Privacy (FDP), which protects data privacy locally at each site and only transmits privatized information to the central server. - **Statistical estimation problems**: The paper studies the minimax risks of three classic statistical problems under the FDP constraint: 1. **Univariate mean estimation**: It explores how to estimate the univariate mean under the FDP constraint. 2. **Low - dimensional linear regression**: It studies how to perform low - dimensional linear regression under the FDP constraint. 3. **High - dimensional linear regression**: It analyzes how to handle high - dimensional linear regression problems under the FDP constraint. ### Main contributions - **Privacy mechanism**: A general detection method has been developed to identify useful source datasets and automatically select these datasets for learning. - **Optimality analysis**: For the univariate mean estimation and low - dimensional linear regression problems, optimality results under the FDP constraint have been established. - **Technical contribution**: The adaptive clipping strategy has been introduced to achieve optimality in a wider range of parameters. ### Conclusion Through strict theoretical analysis and experimental verification, the paper demonstrates the effectiveness and superiority of combining differential privacy technology in federated transfer learning, providing new ideas and methods for solving data heterogeneity and privacy protection problems.