Abstract:Large Language Models (LLMs) excel in natural language understanding by capturing hidden semantics in vector space. This process enriches the value of text embeddings for various downstream tasks, thereby fostering the Embedding-as-a-Service (EaaS) business model. However, the risk of privacy leakage due to direct text transmission to servers remains a critical concern. To address this, we introduce Split-N-Denoise (SnD), an private inference framework that splits the model to execute the token embedding layer on the client side at minimal computational cost. This allows the client to introduce noise prior to transmitting the embeddings to the server, and subsequently receive and denoise the perturbed output embeddings for downstream tasks. Our approach is designed for the inference stage of LLMs and requires no modifications to the model parameters. Extensive experiments demonstrate SnD's effectiveness in optimizing the privacy-utility tradeoff across various LLM architectures and diverse downstream tasks. The results reveal an improvement in performance under the same privacy budget compared to the baselines by over 10\% on average, offering clients a privacy-preserving solution for local privacy protection.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to protect users' privacy when using large language models (LLMs) for reasoning. Specifically, when users obtain text embeddings generated by LLMs through network services (i.e., "Embedding - as - a - Service" EaaS), there is a risk of privacy leakage due to the direct transmission of text to the server. To solve this problem, the authors propose the Split - N - Denoise (SnD) framework, which is a private reasoning framework. It protects users' privacy by performing the token - embedding layer on the client side and introducing noise before transmitting the embeddings. In addition, SnD also includes a denoising module that allows the client to receive and denoise the perturbed output embeddings returned from the server for downstream tasks. This method does not require any modification of the model parameters and optimizes the trade - off between privacy and utility in various LLM architectures and different downstream tasks. ### Main Contributions - **Proposing the SnD Framework**: Combining split - reasoning and denoising techniques to protect users' privacy under the constraint of local differential privacy (LDP). Empirical studies show that this method improves the average performance by more than 10% compared to existing differential - privacy - based methods under the same privacy budget, and can maintain utility even in extremely low - privacy - budget settings (\(\eta \leq 0.01\)). - **Designing an Innovative Denoising Method**: Deploying a denoising model on the client side. This model is pre - trained on the server side using public datasets and synthetic noise, and then deployed on the client side to enhance the embeddings by using the specific noise level provided by the user and the original intermediate results (IRs). ### Method Overview 1. **Local Encoder Module**: The user obtains the token - embeddings of the input locally. 2. **Privatization Module**: The user privatizes the token - representations before transmitting them to the server to meet the LDP requirements. 3. **Cloud Encoder Module**: The server transforms the privatized token - representations and returns the embeddings to the user. 4. **Denoising Module**: The user uses its original input and specific noise level to locally denoise the received embeddings to optimize the balance between privacy and utility. ### Noise Mechanism The authors use \(d\chi\)-privacy to privatize the token - representation layer of the client. Given an input sequence \(x = [x_1,\ldots,x_n]\), the token - representation layer converts it into a vector sequence \(X = [x_1,\ldots,x_n] \in \mathbb{R}^{n\times d}\). Assuming that the L2 - norm is used as the distance metric, applying the \(d\chi\)-privacy mechanism with the parameter \(\eta\), the implementation for a given word embedding \(x_t \in \mathbb{R}^d\) is by adding Laplace noise \(z \sim c\exp(-\eta \|z\|)\), where \(c\) is a real - valued constant. To improve the performance of the denoising model, the client clips the L2 - norm of the privatized representation so that it does not exceed \(C_{x_t}\): \[M'(x_t) = M(x_t)\cdot\min\left(1,\frac{C_{x_t}}{\|M(x_t)\|}\right)\] where \(C_{x_t}=\max_{x_t \in X_t}\|x_t\|\). ### Denoising Model The limitation of server - side denoising lies in the lack of knowledge about the noise level, which limits its denoising ability. Therefore, the authors propose a client - side denoising framework, where the user uses its specific noise and original input to perform error correction on the perturbed embeddings. Given the privatized token - representations \(\tilde{X} = [\tilde{x}_1,\ldots,\tilde{x}_n]\) and the noise matrix \(Z = [z_1,\ldots,z_n] \in \mathbb{R}^{n\times d}\), the denoising model is parameterized by an L - layer Transformer decoder: \[e_d = D(e_n,\tilde{X},Z)\] The input of the denoising model is a concatenation of vectors: \[H_0 = [e_n;\tilde{x}_1,\ldots,\tilde{

Split-and-Denoise: Protect large language model inference with local differential privacy

Adaptively Private Next-Token Prediction of Large Language Models

Enhancing Accuracy-Privacy Trade-off in Differentially Private Split Learning

PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration

PDSS: A Privacy-Preserving Framework for Step-by-Step Distillation of Large Language Models

Differentially Private Next-Token Prediction of Large Language Models

Evaluating Privacy Leakage in Split Learning

Learning Differentially Private Recurrent Language Models

A Split-and-Privatize Framework for Large Language Model Fine-Tuning

PrivLM-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models

LMO-DP: Optimizing the Randomization Mechanism for Differentially Private Fine-Tuning (Large) Language Models

Fine-Tuning Large Language Models with User-Level Differential Privacy

DisLLM: Distributed LLMs for Privacy Assurance in Resource-Constrained Environments

DR-Encoder: Encode Low-rank Gradients with Random Prior for Large Language Models Differentially Privately

PFID: Privacy First Inference Delegation Framework for LLMs

Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Scalable Differential Privacy Mechanisms for Real-Time Machine Learning Applications

Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack

Differentially Private Language Models for Secure Data Sharing

$Λ$-Split: A Privacy-Preserving Split Computing Framework for Cloud-Powered Generative AI

Private Language Models via Truncated Laplacian Mechanism