Contrast-Then-Approximate: Analyzing Keyword Leakage of Generative Language Models

Zhirui Zeng,Tao Xiang,Shangwei Guo,Jialing He,Qiao Zhang,Guowen Xu,Tianwei Zhang
DOI: https://doi.org/10.1109/tifs.2024.3392535
IF: 7.231
2024-05-10
IEEE Transactions on Information Forensics and Security
Abstract:There is an increasing tendency to fine-tune large-scale pre-trained language models (LMs) using small private datasets to improve their capability for downstream applications. In this paper, we systematically analyze the pre-train and then fine-tune the process of generative LMs and show that the fine-tuned LMs would leak sensitive keywords of the private datasets even without any prior knowledge of the downstream tasks. Specifically, we propose a novel and efficient keyword inference attack framework to accurately and maximally recover sensitive keywords. Owing to the fine-tuning process, pre-trained and fine-tuned models might respond differently to identical input prefixes. To identify potential sensitive sentences for training the fine-tuend LM, we introduce a contrast difference score that assesses the response variations between a pre-trained LM and its corresponding fine-tuned LM. Following this, we iteratively fine-tune the pre-trained model using these sensitive sentences to minimize the disparity between the target model and the pre-trained model, thereby maximizing the number of inferred sensitive keywords. We implement two types of keyword inference attacks (i.e., domain and private) according to our framework and conduct comprehensive experiments on three downstream applications to evaluate the performance. The experimental results demonstrate that our domain keyword inference attack achieves a precision of 85%, while our private keyword inference attack can extract highly sensitive personal information for a significant number of individuals (approximately 0.3% of all customers in the private fine-tuning dataset, which contains 40,000 pieces of personal information).
computer science, theory & methods,engineering, electrical & electronic
What problem does this paper attempt to address?