Extracting Memorized Training Data via Decomposition

Ellen Su,Anu Vellore,Amy Chang,Raffaele Mura,Blaine Nelson,Paul Kassianik,Amin Karbasi
2024-10-02
Abstract:The widespread use of Large Language Models (LLMs) in society creates new information security challenges for developers, organizations, and end-users alike. LLMs are trained on large volumes of data, and their susceptibility to reveal the exact contents of the source training datasets poses security and safety risks. Although current alignment procedures restrict common risky behaviors, they do not completely prevent LLMs from leaking data. Prior work demonstrated that LLMs may be tricked into divulging training data by using out-of-distribution queries or adversarial techniques. In this paper, we demonstrate a simple, query-based decompositional method to extract news articles from two frontier LLMs. We use instruction decomposition techniques to incrementally extract fragments of training data. Out of 3723 New York Times articles, we extract at least one verbatim sentence from 73 articles, and over 20% of verbatim sentences from 6 articles. Our analysis demonstrates that this method successfully induces the LLM to generate texts that are reliable reproductions of news articles, meaning that they likely originate from the source training dataset. This method is simple, generalizable, and does not fine-tune or change the production model. If replicable at scale, this training data extraction methodology could expose new LLM security and safety vulnerabilities, including privacy risks and unauthorized data leaks. These implications require careful consideration from model development to its end-use.
Machine Learning,Artificial Intelligence,Cryptography and Security
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the information security challenges associated with large language models (LLMs) in terms of training data leakage. Specifically, the paper explores how to extract training data from two state-of-the-art large language models using a query-based decomposition method. #### Background and Motivation - **Information Security Challenges**: With the widespread application of large language models in society, these models are trained on vast amounts of data, which makes them potentially capable of leaking specific contents of the source training datasets, thereby posing security and privacy risks. - **Limitations of Existing Methods**: Although current alignment techniques can limit some common high-risk behaviors, they cannot completely prevent LLMs from leaking data. Previous research has shown that using out-of-distribution queries or adversarial techniques can induce LLMs to leak training data. #### Research Objectives - **Propose a New Extraction Method**: The paper proposes a simple, query-based decomposition method that gradually extracts training data fragments through instruction decomposition techniques. - **Validate the Method's Effectiveness**: The effectiveness of this method is experimentally validated, particularly its performance on a news article dataset. - **Assess Potential Risks**: The paper analyzes the new security and privacy vulnerabilities that this method might introduce, including privacy risks and unauthorized data leakage. #### Main Contributions - **Method Innovation**: A simple and general method is proposed that does not rely on fine-tuning or altering the production model. - **Empirical Study**: The paper demonstrates the success rate of this method through experiments on 3723 articles from The New York Times and 1349 articles from The Wall Street Journal. - **Security Warning**: The paper emphasizes the security and privacy risks that this method might bring, urging model developers and users to carefully consider these risks during development and usage. ### Summary By proposing a query-based decomposition method, this paper successfully extracts training data fragments from large language models, revealing the potential risks of data leakage in these models. This research not only provides new directions for future studies but also raises important warnings for the security and privacy protection of models.