Abstract:Machine unlearning empowers individuals with the `right to be forgotten' by removing their private or sensitive information encoded in machine learning models. However, it remains uncertain whether MU can be effectively applied to Multimodal Large Language Models (MLLMs), particularly in scenarios of forgetting the leaked visual data of concepts. To overcome the challenge, we propose an efficient method, Single Image Unlearning (SIU), to unlearn the visual recognition of a concept by fine-tuning a single associated image for few steps. SIU consists of two key aspects: (i) Constructing Multifaceted fine-tuning data. We introduce four targets, based on which we construct fine-tuning data for the concepts to be forgotten; (ii) Jointly training loss. To synchronously forget the visual recognition of concepts and preserve the utility of MLLMs, we fine-tune MLLMs through a novel Dual Masked KL-divergence Loss combined with Cross Entropy loss. Alongside our method, we establish MMUBench, a new benchmark for MU in MLLMs and introduce a collection of metrics for its evaluation. Experimental results on MMUBench show that SIU completely surpasses the performance of existing methods. Furthermore, we surprisingly find that SIU can avoid invasive membership inference attacks and jailbreak attacks. To the best of our knowledge, we are the first to explore MU in MLLMs. We will release the code and benchmark in the near future.

What problem does this paper attempt to address?

### The Problem the Paper Attempts to Solve This paper attempts to address the issue of effective Machine Unlearning (MU) in Multimodal Large Language Models (MLLMs), particularly in the context of forgetting the recognition of specific visual concepts. Specifically, the paper focuses on the following challenges: 1. **Limited Data**: In real-world scenarios, collecting a sufficient number of target concept images is very difficult, making it impractical to forget all visual knowledge related to the target concept through traditional methods. 2. **Model Degradation**: Large generative models commonly face the problem of model degradation, where after performing unlearning operations, the model may generate meaningless outputs, such as blank or repetitive tokens, thus losing its utility. 3. **Objective Conflict**: During the unlearning operation, there is an objective conflict between Gradient Ascent (GA) and KL-divergence loss. GA aims to make the model stop generating tokens of the target unlearned concept, while KL-divergence seeks to maintain the consistency of the output probability distribution between the unlearned model and the original model, which includes the probability of generating tokens of the target unlearned concept. To address these challenges, the authors propose an efficient method called Single Image Unlearning (SIU), which enables MLLMs to forget the recognition of specific visual concepts using only one training image. SIU achieves this by constructing multifaceted fine-tuning data and introducing Dual Masked KL-divergence Loss (DMK Loss). Additionally, the authors establish the MMUBench benchmark to evaluate the effectiveness, generalization ability, specificity, fluency, and diversity of machine unlearning methods in MLLMs. Experimental results show that SIU outperforms existing methods across all evaluation metrics and can resist membership inference attacks and jailbreak attacks.

Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models

Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

CLEAR: Character Unlearning in Textual and Visual Modalities

Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning

Rethinking Machine Unlearning for Large Language Models

A Closer Look at Machine Unlearning for Large Language Models

Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models

One-Shot Unlearning of Personal Identities

MUNBa: Machine Unlearning via Nash Bargaining

Forget Vectors at Play: Universal Input Perturbations Driving Machine Unlearning in Image Classification

Learning to Unlearn for Robust Machine Unlearning

EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models

CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP

MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts

MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning

Unified Gradient-Based Machine Unlearning with Remain Geometry Enhancement

Deep Unlearn: Benchmarking Machine Unlearning

Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning

Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset

A hybrid framework for effective and efficient machine unlearning