Abstract:We consider vertical logistic regression (VLR) trained with mini-batch gradient descent — a setting which has attracted growing interest among industries and proven to be useful in a wide range of applications including finance and medical research. We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks, where the protocols might differ between one another, yet a procedure of obtaining local gradients is implicitly shared. We first consider the honest-but-curious threat model, in which the detailed implementation of protocol is neglected and only the shared procedure is assumed, which we abstract as an oracle. We find that even under this general setting, single-dimension feature and label can still be recovered from the other party under suitable constraints of batch size, thus demonstrating the potential vulnerability of all frameworks following the same philosophy. Then we look into a popular instantiation of the protocol based on Homomorphic Encryption (HE). We propose an active attack that significantly weaken the constraints on batch size in the previous analysis via generating and compressing auxiliary ciphertext. To address the privacy leakage within the HE-based protocol, we develop a simple-yet-effective countermeasure based on Differential Privacy (DP), and provide both utility and privacy guarantees for the updated algorithm. Finally, we empirically verify the effectiveness of our attack and defense on benchmark datasets. Altogether, our findings suggest that all vertical federated learning frameworks that solely depend on HE might contain severe privacy risks, and DP, which has already demonstrated its power in horizontal federated learning, can also play a crucial role in the vertical setting, especially when coupled with HE or secure multi-party computation (MPC) techniques.

DR-Encoder: Encode Low-rank Gradients with Random Prior for Large Language Models Differentially Privately

Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning

CG-FedLLM: How to Compress Gradients in Federated Fune-tuning for Large Language Models

DP-DyLoRA: Fine-Tuning Transformer-Based Models On-Device under Differentially Private Federated Learning using Dynamic Low-Rank Adaptation

Fine-Tuning Large Language Models with User-Level Differential Privacy

LMO-DP: Optimizing the Randomization Mechanism for Differentially Private Fine-Tuning (Large) Language Models

Learning Differentially Private Recurrent Language Models

Large Language Models Can Be Strong Differentially Private Learners

Encryption-Friendly LLM Architecture

An Efficient DP-SGD Mechanism for Large Scale NLP Models

A Fine-Grained Differentially Private Federated Learning Against Leakage from Gradients

DP-LSSGD: A Stochastic Optimization Method to Lift the Utility in Privacy-Preserving ERM

Differentially Private Next-Token Prediction of Large Language Models

Shield Against Gradient Leakage Attacks: Adaptive Privacy-Preserving Federated Learning

Split-and-Denoise: Protect large language model inference with local differential privacy

Model-Based Differentially Private Knowledge Transfer for Large Language Models

Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Client-based differential privacy federated learning

On the Implicit Relation Between Low-Rank Adaptation and Differential Privacy

Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive Privacy Analysis and Beyond