MemHunter: Automated and Verifiable Memorization Detection at Dataset-scale in LLMs

Zhenpeng Wu,Jian Lou,Zibin Zheng,Chuan Chen
2024-12-10
Abstract:Large language models (LLMs) have been shown to memorize and reproduce content from their training data, raising significant privacy concerns, especially with web-scale datasets. Existing methods for detecting memorization are largely sample-specific, relying on manually crafted or discretely optimized memory-inducing prompts generated on a per-sample basis, which become impractical for dataset-level detection due to the prohibitive computational cost of iterating over all samples. In real-world scenarios, data owners may need to verify whether a susceptible LLM has memorized their dataset, particularly if the LLM may have collected the data from the web without authorization. To address this, we introduce \textit{MemHunter}, which trains a memory-inducing LLM and employs hypothesis testing to efficiently detect memorization at the dataset level, without requiring sample-specific memory inducing. Experiments on models such as Pythia and Llama-2 demonstrate that \textit{MemHunter} can extract up to 40\% more training data than existing methods under constrained time resources and reduce search time by up to 80\% when integrated as a plug-in. Crucially, \textit{MemHunter} is the first method capable of dataset-level memorization detection, providing an indispensable tool for assessing privacy risks in LLMs that are powered by vast web-sourced datasets.
Cryptography and Security,Machine Learning
What problem does this paper attempt to address?