U Can't Gen This? A Survey of Intellectual Property Protection Methods for Data in Generative AI

Tanja Šarčević,Alicja Karlowicz,Rudolf Mayer,Ricardo Baeza-Yates,Andreas Rauber
2024-04-22
Abstract:Large Generative AI (GAI) models have the unparalleled ability to generate text, images, audio, and other forms of media that are increasingly indistinguishable from human-generated content. As these models often train on publicly available data, including copyrighted materials, art and other creative works, they inadvertently risk violating copyright and misappropriation of intellectual property (IP). Due to the rapid development of generative AI technology and pressing ethical considerations from stakeholders, protective mechanisms and techniques are emerging at a high pace but lack systematisation. In this paper, we study the concerns regarding the intellectual property rights of training data and specifically focus on the properties of generative models that enable misuse leading to potential IP violations. Then we propose a taxonomy that leads to a systematic review of technical solutions for safeguarding the data from intellectual property violations in GAI.
Computers and Society,Artificial Intelligence
What problem does this paper attempt to address?
The problem this paper attempts to address is the potential infringement of Intellectual Property (IP) during the training process of Generative AI (GAI). Specifically, since GAI models typically utilize publicly available data for training, which includes copyrighted materials, artworks, and other creative works, there is a risk of inadvertently infringing on copyrights and improperly using intellectual property. The paper focuses on how the characteristics of generative models can lead to such misuse and proposes a taxonomy to systematically review technical solutions aimed at protecting training data from IP infringement. The main contributions of the paper include: - Reviewing the potential IP infringement situations in GAI model training data; - Systematically reviewing technical solutions to protect the intellectual property of content used to train large GAI models; - Proposing a classification and taxonomy of IP protection methods; - Discussing policies and practices surrounding GAI IP issues. Through these contributions, the paper aims to fill the current gap in the systematic description of technical solutions and provide researchers and practitioners with a comprehensive perspective to better understand and address IP challenges in the GAI field.