PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration

Ziqian Zeng,Jianwei Wang,Junyao Yang,Zhengdong Lu,Huiping Zhuang,Cen Chen
2024-10-12
Abstract:The widespread usage of online Large Language Models (LLMs) inference services has raised significant privacy concerns about the potential exposure of private information in user inputs to malicious eavesdroppers. Existing privacy protection methods for LLMs suffer from either insufficient privacy protection, performance degradation, or large inference time overhead. To address these limitations, we propose PrivacyRestore, a plug-and-play method to protect the privacy of user inputs during LLM inference. The server first trains restoration vectors for each privacy span and then release to clients. Privacy span is defined as a contiguous sequence of tokens within a text that contain private information. The client then aggregate restoration vectors of all privacy spans in the input into a single meta restoration vector which is later sent to the server side along with the input without privacy <a class="link-external link-http" href="http://spans.The" rel="external noopener nofollow">this http URL</a> private information is restored via activation steering during inference. Furthermore, we prove that PrivacyRestore inherently prevents the linear growth of the privacy <a class="link-external link-http" href="http://budget.We" rel="external noopener nofollow">this http URL</a> create three datasets, covering medical and legal domains, to evaluate the effectiveness of privacy preserving methods. The experimental results show that PrivacyRestore effectively protects private information and maintain acceptable levels of performance and inference overhead.
Cryptography and Security,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to protect the privacy information of user inputs in the online inference services of large - language models (LLMs). Specifically, when users interact with large - language models deployed on cloud platforms, their inputs may contain sensitive and proprietary information, such as medical records, unpublished narrative works, and personal financial details. These information are at risk of being leaked by interceptors or untrusted service providers. Existing privacy - protection methods either have insufficient privacy protection or lead to performance degradation or significant inference - time overhead. Therefore, this paper proposes a new privacy - protection method - PrivacyRestore, aiming to effectively protect the privacy information in user inputs while maintaining high performance and efficient inference efficiency. ### Main Contributions 1. **Proposed a plug - in privacy - protection method**: Remove privacy fragments from user inputs and restore privacy information by activation redirection during the model - inference process. 2. **Proposed Attention - Aware Weighted Aggregation (AWA)**: Construct meta - restoration vectors to ensure the appropriate representation of all privacy fragments in the input and prevent attackers from inferring privacy fragments solely from the meta - restoration vectors. 3. **Constructed two medical - diagnosis datasets**: Used to evaluate the effectiveness of the proposed method, and the results show that PrivacyRestore can maintain acceptable performance and inference efficiency while protecting privacy information. ### Method Overview - **Editing Attention - Head Identification**: Determine the top K most relevant attention heads for each privacy fragment through probe techniques. - **Restoration - Vector Training**: Train restoration vectors to align the prediction results of the model for the input after removing privacy fragments with those of the complete input. - **Attention - Aware Weighted Aggregation (AWA)**: Calculate the weighted sum according to the importance of each privacy fragment to generate a meta - restoration vector. - **Inference Process**: The user removes privacy fragments on the client side and obtains the meta - restoration vector, then sends the query and the meta - restoration vector to the server, and the server uses the meta - restoration vector to edit the attention heads and generate the final output. ### Experimental Results The experimental results show that PrivacyRestore can maintain high model performance and inference efficiency while protecting privacy information. Especially in medical - diagnosis tasks, PrivacyRestore outperforms other methods at different privacy levels, showing its potential in practical applications. ### Formula Explanation - **Attention Mechanism**: \[ U_{l,h}=\text{Attn}_{l,h}(X_{l,h}) \] where \(X_{l,h}\) is the sequence input of the \(h\) - th head in the \(l\) - th layer, and \(U_{l,h}\) is the sequence output. - **Restoration - Vector Injection**: \[ y_{l,h}=u_{l,h}+R_{l,h} \] where \(u_{l,h} = U_{l,h}[- 1]\) is the output of the last token, \(y_{l,h}\) is the output after privacy restoration, and \(R_{l,h}\) is the aggregated restoration vector of the \(h\) - th head in the \(l\) - th layer. - **Classifier**: \[ F_{l,h}^s(u_{l,h})=\sigma(\theta\cdot u_{l,h}) \] where \(\theta\) is the parameter of the classifier, which is used to determine whether the target privacy fragment \(s\) appears in the query. - **Loss Function**: \[ \text{LORPO}=-\log P(a_p|q;\Theta)-\log\sigma\left(\frac{\log\text{ratio}(a_p|q;\Theta)}{\text{ratio}(a_n|q;\Theta)}\right) \] where \(a_p\) and \(a_n\) are the responses of positive and negative samples respectively, and \(\Theta\) is the set of restoration vectors. Through these methods and techniques, PrivacyResto