A Survey on Machine Unlearning: Techniques and New Emerged Privacy Risks

Hengzhu Liu,Ping Xiong,Tianqing Zhu,Philip S. Yu
2024-06-10
Abstract:The explosive growth of machine learning has made it a critical infrastructure in the era of artificial intelligence. The extensive use of data poses a significant threat to individual privacy. Various countries have implemented corresponding laws, such as GDPR, to protect individuals' data privacy and the right to be forgotten. This has made machine unlearning a research hotspot in the field of privacy protection in recent years, with the aim of efficiently removing the contribution and impact of individual data from trained models. The research in academia on machine unlearning has continuously enriched its theoretical foundation, and many methods have been proposed, targeting different data removal requests in various application scenarios. However, recently researchers have found potential privacy leakages of various of machine unlearning approaches, making the privacy preservation on machine unlearning area a critical topic. This paper provides an overview and analysis of the existing research on machine unlearning, aiming to present the current vulnerabilities of machine unlearning approaches. We analyze privacy risks in various aspects, including definitions, implementation methods, and real-world applications. Compared to existing reviews, we analyze the new challenges posed by the latest malicious attack techniques on machine unlearning from the perspective of privacy threats. We hope that this survey can provide an initial but comprehensive discussion on this new emerging area.
Cryptography and Security
What problem does this paper attempt to address?
This paper primarily explores the field of machine unlearning and the new emerging privacy risks that come with its development. With the widespread application of machine learning, data privacy protection has become crucial, especially considering the "right to be forgotten" required by regulations such as GDPR. Machine unlearning aims to effectively remove the contribution and influence of individual data from trained models, but existing methods may have privacy leakage issues. Researchers found that, despite the purpose of machine unlearning being privacy protection, existing unlearning techniques may threaten privacy in unexpected ways. For example, by comparing the distribution of model outputs before and after learning, attackers may be able to obtain information about forgotten data, which violates the principle of privacy protection. Current machine unlearning methods can generally be divided into data-driven and model-driven approaches, with the former being achieved by modifying the original training set and the latter by adjusting model parameters. The paper proposes a classification of machine unlearning methods and analyzes the privacy risks associated with different methods, while discussing possible defense strategies. The authors also review the applications and potential privacy issues of machine unlearning, pointing out future research directions. The paper emphasizes the necessity of a comprehensive and systematic investigation of privacy risks in machine unlearning to help researchers, model owners, and data owners understand the existing and potential privacy risks.