Not Every Image is Worth a Thousand Words: Quantifying Originality in Stable Diffusion

Adi Haviv,Shahar Sarfaty,Uri Hacohen,Niva Elkin-Koren,Roi Livni,Amit H Bermano
2024-08-15
Abstract:This work addresses the challenge of quantifying originality in text-to-image (T2I) generative diffusion models, with a focus on copyright originality. We begin by evaluating T2I models' ability to innovate and generalize through controlled experiments, revealing that stable diffusion models can effectively recreate unseen elements with sufficiently diverse training data. Then, our key insight is that concepts and combinations of image elements the model is familiar with, and saw more during training, are more concisly represented in the model's latent space. We hence propose a method that leverages textual inversion to measure the originality of an image based on the number of tokens required for its reconstruction by the model. Our approach is inspired by legal definitions of originality and aims to assess whether a model can produce original content without relying on specific prompts or having the training data of the model. We demonstrate our method using both a pre-trained stable diffusion model and a synthetic dataset, showing a correlation between the number of tokens and image originality. This work contributes to the understanding of originality in generative models and has implications for copyright infringement cases.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the challenge of quantifying originality in text - to - image (T2I) generative diffusion models, with a particular focus on copyright originality. Specifically: 1. **Evaluating the innovation and generalization ability of the model**: - Researchers evaluated the innovation and generalization ability of T2I models when facing unseen elements through controlled experiments, revealing that the stable diffusion model can effectively reproduce unseen elements when the training data is diverse enough. 2. **Proposing a method to quantify originality**: - The key insight is that the model represents familiar concepts and their combinations more concisely. Therefore, researchers proposed a method based on textual inversion, measuring the originality of an image by the number of tokens required for the model to reconstruct the image. - This method is inspired by the legal definition of originality and aims to assess whether the model can generate original content without relying on specific prompts or training data. 3. **Verifying the effectiveness of the method**: - Researchers conducted experiments using pre - trained stable diffusion models and synthetic datasets, demonstrating the correlation between the number of tokens and the originality of the image. - The experimental results show that for common images, such as Van Gogh's "The Starry Night", only one token is required for accurate reconstruction; while for original images, more tokens are required. 4. **Discussing originality and copyright infringement issues**: - The paper also discusses the application of quantifying originality in copyright infringement cases, especially when evaluating the originality of the output content of T2I models trained on large datasets containing copyrighted materials such as LAION - 5B. ### Summary The main contribution of this paper is to provide a new technique to identify generality and originality in generative models and propose a set of synthetic and real - world experimental methods that can be further developed to evaluate and improve generative models. This helps in understanding the originality of generative models and provides a potential application tool for copyright infringement cases. ### Formula examples To ensure the correctness and readability of the formulas, the following are some key formulas presented in Markdown format: 1. **Encoding and decoding process**: \[ z = VAE\_Encoder(x), \quad x' = VAE\_Decoder(z) \] where \( x' \) is the reconstructed image. 2. **Embedding representation**: \[ e_t = TextEncoder(t) \] 3. **Multi - token representation**: \[ S^*_m = e_t(t_1) e_t(t_2) \ldots e_t(t_m) \] 4. **Reconstruction score**: \[ Reconstruction\ Score(x'_i) = DreamSim(x'_i, x) \] \[ Average\ Reconstruction\ Score(T) = \frac{1}{20} \sum_{i = 1}^{20} Reconstruction\ Score(x'_i) \] These formulas help explain the working principle and evaluation method of the model.