Abstract:This work addresses the challenge of quantifying originality in text-to-image (T2I) generative diffusion models, with a focus on copyright originality. We begin by evaluating T2I models' ability to innovate and generalize through controlled experiments, revealing that stable diffusion models can effectively recreate unseen elements with sufficiently diverse training data. Then, our key insight is that concepts and combinations of image elements the model is familiar with, and saw more during training, are more concisly represented in the model's latent space. We hence propose a method that leverages textual inversion to measure the originality of an image based on the number of tokens required for its reconstruction by the model. Our approach is inspired by legal definitions of originality and aims to assess whether a model can produce original content without relying on specific prompts or having the training data of the model. We demonstrate our method using both a pre-trained stable diffusion model and a synthetic dataset, showing a correlation between the number of tokens and image originality. This work contributes to the understanding of originality in generative models and has implications for copyright infringement cases.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the challenge of quantifying originality in text - to - image (T2I) generative diffusion models, with a particular focus on copyright originality. Specifically: 1. **Evaluating the innovation and generalization ability of the model**: - Researchers evaluated the innovation and generalization ability of T2I models when facing unseen elements through controlled experiments, revealing that the stable diffusion model can effectively reproduce unseen elements when the training data is diverse enough. 2. **Proposing a method to quantify originality**: - The key insight is that the model represents familiar concepts and their combinations more concisely. Therefore, researchers proposed a method based on textual inversion, measuring the originality of an image by the number of tokens required for the model to reconstruct the image. - This method is inspired by the legal definition of originality and aims to assess whether the model can generate original content without relying on specific prompts or training data. 3. **Verifying the effectiveness of the method**: - Researchers conducted experiments using pre - trained stable diffusion models and synthetic datasets, demonstrating the correlation between the number of tokens and the originality of the image. - The experimental results show that for common images, such as Van Gogh's "The Starry Night", only one token is required for accurate reconstruction; while for original images, more tokens are required. 4. **Discussing originality and copyright infringement issues**: - The paper also discusses the application of quantifying originality in copyright infringement cases, especially when evaluating the originality of the output content of T2I models trained on large datasets containing copyrighted materials such as LAION - 5B. ### Summary The main contribution of this paper is to provide a new technique to identify generality and originality in generative models and propose a set of synthetic and real - world experimental methods that can be further developed to evaluate and improve generative models. This helps in understanding the originality of generative models and provides a potential application tool for copyright infringement cases. ### Formula examples To ensure the correctness and readability of the formulas, the following are some key formulas presented in Markdown format: 1. **Encoding and decoding process**: \[ z = VAE\_Encoder(x), \quad x' = VAE\_Decoder(z) \] where \( x' \) is the reconstructed image. 2. **Embedding representation**: \[ e_t = TextEncoder(t) \] 3. **Multi - token representation**: \[ S^*_m = e_t(t_1) e_t(t_2) \ldots e_t(t_m) \] 4. **Reconstruction score**: \[ Reconstruction\ Score(x'_i) = DreamSim(x'_i, x) \] \[ Average\ Reconstruction\ Score(T) = \frac{1}{20} \sum_{i = 1}^{20} Reconstruction\ Score(x'_i) \] These formulas help explain the working principle and evaluation method of the model.

Not Every Image is Worth a Thousand Words: Quantifying Originality in Stable Diffusion

On Copyright Risks of Text-to-Image Diffusion Models

Detecting Image Attribution for Text-to-Image Diffusion Models in RGB and Beyond

Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models

Exploiting Watermark-Based Defense Mechanisms in Text-to-Image Diffusion Models for Unauthorized Data Usage

Diversity and Diffusion: Observations on Synthetic Image Distributions with Stable Diffusion

Measuring the Success of Diffusion Models at Imitating Human Artists

If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection

Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models

DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models

Understanding and Mitigating Copying in Diffusion Models

Elucidating Optimal Reward-Diversity Tradeoffs in Text-to-Image Diffusion Models

Model Collapse in the Self-Consuming Chain of Diffusion Finetuning: A Novel Perspective from Quantitative Trait Modeling

A Dataset and Benchmark for Copyright Protection from Text-to-Image Diffusion Models

Image Copy Detection for Diffusion Models

Test-time Conditional Text-to-Image Synthesis Using Diffusion Models

Diffusion Beats Autoregressive: An Evaluation of Compositional Generation in Text-to-Image Models

Measuring the originality of intellectual property assets based on machine learning outputs

Trustworthy Text-to-Image Diffusion Models: A Timely and Focused Survey

GRADE: Quantifying Sample Diversity in Text-to-Image Models

Are Diffusion Models Vision-And-Language Reasoners?