CopyScope: Model-level Copyright Infringement Quantification in the Diffusion Workflow

Junlei Zhou,Jiashi Gao,Ziwei Wang,Xuetao Wei
2023-10-13
Abstract:Web-based AI image generation has become an innovative art form that can generate novel artworks with the rapid development of the diffusion model. However, this new technique brings potential copyright infringement risks as it may incorporate the existing artworks without the owners' consent. Copyright infringement quantification is the primary and challenging step towards AI-generated image copyright traceability. Previous work only focused on data attribution from the training data perspective, which is unsuitable for tracing and quantifying copyright infringement in practice because of the following reasons: (1) the training datasets are not always available in public; (2) the model provider is the responsible party, not the image. Motivated by this, in this paper, we propose CopyScope, a new framework to quantify the infringement of AI-generated images from the model level. We first rigorously identify pivotal components within the AI image generation pipeline. Then, we propose to take advantage of Fréchet Inception Distance (FID) to effectively capture the image similarity that fits human perception naturally. We further propose the FID-based Shapley algorithm to evaluate the infringement contribution among models. Extensive experiments demonstrate that our work not only reveals the intricacies of infringement quantification but also effectively depicts the infringing models quantitatively, thus promoting accountability in AI image-generation tasks.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the copyright traceability of AI - generated images, especially quantifying the degree of copyright infringement in the diffusion model workflow. With the development of web - based AI image generation technology, although novel artworks can be created, it also brings potential risks of copyright infringement, because these models may incorporate the features of existing works without the consent of copyright owners. Specifically, this paper aims to solve the following problems: 1. **Unavailable training data**: In practical applications, training data sets are not always publicly available, so it is difficult to track and quantify copyright infringement from the perspective of training data. 2. **Clear liability subjects**: The liable party for copyright infringement should be the model provider (i.e., an individual or organization that misuses online image collections without the consent of the owners), rather than the generated image itself. As long as the model that generates the infringing image can be identified, the corresponding infringer can be found and the degree of their infringement can be quantified. To solve these problems, the author proposes a new framework - CopyScope, which is used to quantify the copyright infringement of AI - generated images at the model level. Through this framework, researchers hope to promote the legal use of copyright in the emerging field of AI image generation and enhance the sense of responsibility of relevant parties. ### Main contributions - Proposing a brand - new copyright infringement quantification framework, CopyScope, which starts from the model level and helps stakeholders investigate complex infringement cases. - Proposing a Shapley algorithm based on FID (Fréchet Inception Distance), which effectively captures image similarity by using FID and quantifies the contributions of different models to infringement in combination with the Shapley value scheme. - Verifying the effectiveness of the CopyScope framework through extensive experiments, which can quantitatively describe infringing models, thereby promoting the legal use of AI - generated content. ### Method overview The CopyScope framework includes three closely - connected stages: Identification, Quantification, and Evaluation. 1. **Identification stage**: Through in - depth analysis of 16,000 generated images, four key components (base model, Lora, ControlNet, and key prompt words) are determined, which have an important impact on infringement in the diffusion workflow. 2. **Quantification stage**: Five similarity measurement methods (Cosine, DHash, Hist, SSIM, and FID) are compared and analyzed, and it is found that FID is the most effective quantification method because it can naturally capture image similarity in line with human perception. 3. **Evaluation stage**: The scenario is modeled as a cooperative game model, and a Shapley algorithm based on FID is proposed to evaluate the contribution of each infringing model. Through these three stages, CopyScope not only reveals the complexity in infringement quantification but also can effectively depict infringing models, thereby promoting accountability in AI image generation tasks.