A Split-and-Privatize Framework for Large Language Model Fine-Tuning

Xicong Shen,Yang Liu,Huiqi Liu,Jue Hong,Bing Duan,Zirui Huang,Yunlong Mao,Ye Wu,Di Wu

DOI: https://doi.org/10.48550/arXiv.2312.15603

2023-12-25

Abstract:Fine-tuning is a prominent technique to adapt a pre-trained language model to downstream scenarios. In parameter-efficient fine-tuning, only a small subset of modules are trained over the downstream datasets, while leaving the rest of the pre-trained model frozen to save computation resources. In recent years, a popular productization form arises as Model-as-a-Service (MaaS), in which vendors provide abundant pre-trained language models, server resources and core functions, and customers can fine-tune, deploy and invoke their customized model by accessing the one-stop MaaS with their own private dataset. In this paper, we identify the model and data privacy leakage risks in MaaS fine-tuning, and propose a Split-and-Privatize (SAP) framework, which manage to mitigate the privacy issues by adapting the existing split learning architecture. The proposed SAP framework is sufficiently investigated by experiments, and the results indicate that it can enhance the empirical privacy by 62% at the cost of 1% model performance degradation on the Stanford Sentiment Treebank dataset.

Computation and Language

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the problems of data and model privacy leakage when fine - tuning large - scale language models in the Model - as - a - Service (MaaS) scenario. Specifically: 1. **Model privacy**: Pretrained language models (PLMs) usually contain millions or even billions of parameters, which are regarded as the proprietary assets of vendors and cannot be made public. Therefore, customers cannot directly access the complete model weights. 2. **Data privacy**: Customers' text data usually contains sensitive information, such as identity and asset information. If the raw data or representations are directly transmitted to the vendor, it may lead to serious privacy leakage. 3. **Balance between privacy protection and performance**: Although existing privacy - protection methods (such as differential privacy) can protect data privacy, they often reduce the performance of the model on downstream tasks. Therefore, a method that can protect privacy and maintain model performance is required. To solve the above problems, the authors propose a Split - and - Privatize (SAP) framework based on the existing split - learning architecture. The SAP framework alleviates the privacy leakage problem by splitting the model and applying privacy - protection mechanisms, and optimizes the trade - off between privacy and utility through the Contribution Token Identification (CTI) method. Experimental results show that the SAP framework can maintain high model performance while protecting model and data privacy.

A Split-and-Privatize Framework for Large Language Model Fine-Tuning

Differentially Private Fine-tuning of Language Models

On Active Privacy Auditing in Supervised Fine-tuning for White-Box Language Models

Large Language Models Can Be Good Privacy Protection Learners

PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners

Split-and-Denoise: Protect large language model inference with local differential privacy

Differentially Private Parameter-Efficient Fine-tuning for Large ASR Models

Privacy-preserving Fine-tuning of Large Language Models through Flatness

Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack

Efficient and Private: Memorisation under differentially private parameter-efficient fine-tuning in language models

Privacy-Preserving Prompt Tuning for Large Language Model Services

The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks

Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Selective Pre-training for Private Fine-tuning

Large Language Models Can Be Contextual Privacy Protection Learners

Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage

Exploring the Privacy Protection Capabilities of Chinese Large Language Models

PrivLM-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models

LMO-DP: Optimizing the Randomization Mechanism for Differentially Private Fine-Tuning (Large) Language Models

Privacy-Preserving Instructions for Aligning Large Language Models

When Fine-Tuning LLMs Meets Data Privacy: An Empirical Study of Federated Learning in LLM-Based Program Repair