Abstract:Generative artificial intelligence (AI) systems, together with text and data mining (TDM), introduce complex challenges at the junction of data utilization and copyright laws. The inherent reliance of AI on large quantities of data, often encompassing copyrighted materials, results in multifaceted legal quandaries. Issues surface from the unfeasible task of securing permission from each copyright holder for AI training, further muddled by ambiguities in interpreting copyright laws and fair use provisions. Adding to the conundrum, the clandestine practices of data collection in proprietary AI systems obstruct copyright owners from detecting unauthorized use of their materials. The paper explores the exceptions to copyright laws for TDM in the European Union, the United Kingdom, and Japan, recognizing their crucial role in fostering AI development. The EU has a two‐pronged approach under the Directive on Copyright in the Digital Single Market, with one exception catering specifically to research organizations, and another, more generalized one, that can be restricted by rightsholders. The UK allows noncommercial TDM research without infringement but rejected a broader copyright exception due to concerns from the creative sector. Japan has the broadest TDM exception globally, permitting the nonenjoyment use of works without permission, though this can potentially overlook the rights of copyright owners. Notably, the applicability of TDM exceptions to AI‐produced copies remains unclear, creating potential legal challenges. Furthermore, an exploration of the fair use doctrine in the United States provides insight into its potential application in AI development. It focuses on the transformative aspect of usage and its impact on the original work's potential market. This exploration underscores the necessity for clear, practical guidelines. In response to these identified challenges, this paper proposes a hybrid model for TDM exceptions emerges, along with recommended specific mechanisms. The model divides exceptions into noncommercial and commercial uses, providing a nuanced solution to complex copyright issues in AI training. Recommendations incorporate mandatory exceptions for noncommercial uses, an opt‐out clause for commercial uses, enhanced transparency measures, and a searchable portal for copyright owners. In conclusion, striking a delicate equilibrium between technological progress and the incentive for creative expression is of paramount importance. These suggested solutions aim to establish a harmonious foundation that nurtures innovation and creativity while honoring creators' rights, facilitating AI development, promoting transparency, and ensuring fair compensation for creators.

An Economic Solution to Copyright Challenges of Generative AI

Generative AI and Copyright: A Dynamic Perspective

Copyright Protection in Generative AI: A Technical Perspective

Computational Copyright: Towards A Royalty Model for Music Generative AI

Uncertain Boundaries: Multidisciplinary Approaches to Copyright Issues in Generative AI

Copyright Policy Options for Generative Artificial Intelligence

Copyright-Aware Incentive Scheme for Generative Art Models Using Hierarchical Reinforcement Learning

The Research On The Ownership Of Copyright Of AI-generated Content

Assessing the Copyright Infringement Risk of Generative AI Created Works

Managing Copyright Infringement Risks in Generative Artificial Intelligence Data Mining

Talkin' 'Bout AI Generation: Copyright and the Generative-AI Supply Chain

Copyright Protection Against Use of copyrighted Works Without Permission in AI Machine Learning: Focused on Introducing Blockchain-Based Extended Collective Licensing System

Generative Artificial Intelligence and Copyright: Both Sides of the Black Box

Between Copyright and Computer Science: The Law and Ethics of Generative AI

Copyright Safety for Generative AI

AI Royalties -- an IP Framework to Compensate Artists & IP Holders for AI-Generated Content

U Can't Gen This? A Survey of Intellectual Property Protection Methods for Data in Generative AI

Not All Similarities Are Created Equal: Leveraging Data-Driven Biases to Inform GenAI Copyright Disputes

Rethinking copyright exceptions in the era of generative AI: Balancing innovation and intellectual property protection

Research on the Copyright Ownership and Protection of The Content Generated by Artificial Intelligence

Copyright Protection and Accountability of Generative AI:Attack, Watermarking and Attribution