Data Selection for Transfer Unlearning

Nazanin Mohammadi Sepahvand,Vincent Dumoulin,Eleni Triantafillou,Gintare Karolina Dziugaite
2024-05-17
Abstract:As deep learning models are becoming larger and data-hungrier, there are growing ethical, legal and technical concerns over use of data: in practice, agreements on data use may change over time, rendering previously-used training data impermissible for training purposes. These issues have driven increased attention to machine unlearning: removing "the influence of" a subset of training data from a trained model. In this work, we advocate for a relaxed definition of unlearning that does not address privacy applications but targets a scenario where a data owner withdraws permission of use of their data for training purposes. In this context, we consider the important problem of \emph{transfer unlearning} where a pretrained model is transferred to a target dataset that contains some "non-static" data that may need to be unlearned in the future. We propose a new method that uses a mechanism for selecting relevant examples from an auxiliary "static" dataset, and finetunes on the selected data instead of "non-static" target data; addressing all unlearning requests ahead of time. We also adapt a recent relaxed definition of unlearning to our problem setting and demonstrate that our approach is an exact transfer unlearner according to it, while being highly efficient (amortized). We find that our method outperforms the gold standard "exact unlearning" (finetuning on only the "static" portion of the target dataset) on several datasets, especially for small "static" sets, sometimes approaching an upper bound for test accuracy. We also analyze factors influencing the accuracy boost obtained by data selection.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve effective transfer learning of data in deep - learning models while supporting efficient data unlearning. Specifically, the paper focuses on how, in the case where there is "non - static" data in the target dataset, to remove the influence of specific training data on the model without compromising the model's performance. These "non - static" data refer to data whose usage rights may be revoked in the future. The paper proposes a new method. By selecting relevant samples from an auxiliary "static" dataset to replace the "non - static" data in the target dataset for fine - tuning, all unlearning requests are pre - processed without the need to retrain or perform approximate unlearning operations each time a deletion request is received. The main contributions of the paper include: 1. **Proposing a new method for transfer unlearning**: This method realizes efficient transfer learning while supporting unlearning by selecting relevant samples in the auxiliary dataset to replace the "non - static" data in the target dataset. 2. **Adapting to a relaxed unlearning definition**: The method proposed in the paper not only conforms to a new, relaxed unlearning definition, but is also more efficient than other methods. 3. **Empirical research**: Through experiments on multiple datasets, it is proved that the proposed method can outperform the baseline method of only using the "static" part of the data for fine - tuning in many cases, especially when the "static" dataset is small. 4. **Analyzing the factors affecting the effect**: Further analyze when the data selection method is most effective and point out that "the domain affinity between the auxiliary dataset and the target dataset" is an important factor. In short, the paper aims to solve the problem of how to efficiently perform transfer learning and support data unlearning when the target dataset contains "non - static" data that may need to be forgotten in the future in deep - learning models.