When Machine Unlearning Meets Retrieval-Augmented Generation (RAG): Keep Secret or Forget Knowledge?

Shang Wang,Tianqing Zhu,Dayong Ye,Wanlei Zhou
2024-10-20
Abstract:The deployment of large language models (LLMs) like ChatGPT and Gemini has shown their powerful natural language generation capabilities. However, these models can inadvertently learn and retain sensitive information and harmful content during training, raising significant ethical and legal concerns. To address these issues, machine unlearning has been introduced as a potential solution. While existing unlearning methods take into account the specific characteristics of LLMs, they often suffer from high computational demands, limited applicability, or the risk of catastrophic forgetting. To address these limitations, we propose a lightweight unlearning framework based on Retrieval-Augmented Generation (RAG) technology. By modifying the external knowledge base of RAG, we simulate the effects of forgetting without directly interacting with the unlearned LLM. We approach the construction of unlearned knowledge as a constrained optimization problem, deriving two key components that underpin the effectiveness of RAG-based unlearning. This RAG-based approach is particularly effective for closed-source LLMs, where existing unlearning methods often fail. We evaluate our framework through extensive experiments on both open-source and closed-source models, including ChatGPT, Gemini, Llama-2-7b-chat-hf, and PaLM 2. The results demonstrate that our approach meets five key unlearning criteria: effectiveness, universality, harmlessness, simplicity, and robustness. Meanwhile, this approach can extend to multimodal large language models and LLM-based agents.
Cryptography and Security,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that large - language models (LLMs) may inadvertently learn and retain sensitive information or harmful content during the training process, thereby causing ethical and legal issues such as privacy violations, copyright infringements, or the generation of harmful content. To address these issues, the author proposes a lightweight machine - forgetting framework based on Retrieval - Augmented Generation (RAG) technology. ### Specific Problems and Solutions 1. **Problem Description**: - **Retention of Sensitive Information and Harmful Content**: LLMs will inadvertently learn sensitive information or harmful content during training, which may lead to privacy leaks, copyright issues, or the generation of inappropriate content. - **Limitations of Existing Forgetting Methods**: - High computational overhead - Limited scope of application - Risk of catastrophic forgetting - Poor performance for closed - source models 2. **Proposed Solutions**: - **Application of RAG Technology**: By modifying the external knowledge base of RAG, simulate the forgetting effect without directly interacting with the unlearned LLM. - **Constrained Optimization Problem**: Consider the unlearned knowledge construction as a constrained optimization problem, and derive two key components - the retrieval component and the constraint component - to ensure the effectiveness of RAG - based forgetting. - **Lightweight Framework**: This framework only needs to modify the knowledge base and does not require adjusting model parameters, and is applicable to open - source and closed - source models. ### Main Contributions - **Propose the first end - to - end forgetting framework based on RAG for various LLMs for the first time**. - **Achieve high - quality sample and concept forgetting and perform excellently in different scenarios**. - **Outperform three representative LLM forgetting schemes and stand out in five key dimensions: effectiveness, generality, harmlessness, simplicity, and robustness**. - **Applicable to closed - source LLMs, such as GPT - 4o, and can be extended to multi - modal large - language models (MLLMs) and LLM - based agents**. ### Method Overview 1. **RAG Workflow**: - The retrieval module retrieves relevant information from the knowledge base. - The LLM uses the retrieved information to generate the final response. 2. **Forgetting Process**: - Construct unlearned knowledge, including confidentiality requirements. - Use prompt templates to combine the target data with knowledge. - The LLM generates responses according to the confidentiality requirements in the knowledge, achieving sample and concept forgetting. 3. **Optimization Design**: - Formalize the forgetting process as a constrained optimization problem. - Generate relevant knowledge containing confidentiality requirements to ensure that the LLM does not generate relevant content. ### Experimental Results - **Extensive Experimental Verification**: Evaluate open - source and closed - source models (such as ChatGPT, Gemini, Llama - 2 - 7b - chat - hf, PaLM 2). - **Evaluation in Five Key Dimensions**: Effectiveness, generality, harmlessness, simplicity, and robustness. - **Resistance to Attack Tests**: Verify the effectiveness of this method against jailbreak attacks and prompt injection attacks. Through these methods, the paper proposes an effective and efficient LLM forgetting framework, which overcomes the limitations of existing methods and provides new ideas for protecting privacy, copyright, and eliminating harmful content.