Copyright related risks in the creation and use of ML/AI systems

Daniel M. German
2024-03-27
Abstract:This paper summarizes the current copyright related risks that Machine Learning (ML) and Artificial Intelligence (AI) systems (including Large Language Models --LLMs) incur. These risks affect different stakeholders: owners of the copyright of the training data, the users of ML/AI systems, the creators of trained models, and the operators of AI systems. This paper also provides an overview of ongoing legal cases in the United States related to these risks.
Software Engineering,Computers and Society
What problem does this paper attempt to address?
The paper primarily explores the copyright-related risks involved in the creation and use of machine learning (ML) and artificial intelligence (AI) systems. Specifically, the paper attempts to address the following core issues: 1. **Copyright issues of training data**: - The source and ownership of training data, and whether permission from the copyright owner is required. - Whether the trained AI/ML model is considered a derivative work of the original. - Whether the content generated by AI/ML systems is considered a derivative work of the original. 2. **Copyright ownership of generated content**: - Who owns the generated content. - Whether the generated content qualifies for copyright protection. 3. **Differences in laws across countries and regions**: - Significant differences in the interpretation and application of copyright laws across different countries and regions, and how these differences affect the creation and use of ML/AI systems. - For example, the difference between the "fair use" principle in the United States and the "fair dealing" principle in the United Kingdom in handling copyright issues. 4. **Impact of legal litigation**: - The potential impact of ongoing copyright-related lawsuits (such as Anderson v. Stability AI Ltd, Getty Images v. Stability AI Inc, etc.) on the development and use of ML/AI systems. 5. **Risk mitigation measures**: - A series of recommendations to help copyright owners, ML/AI system operators, and users mitigate legal risks related to copyright. Overall, the paper aims to provide guidance for developers and users of ML/AI systems to address and manage copyright-related legal risks by analyzing the current legal framework and actual cases.