Abstract:Large-scale diffusion models have made significant advancements in the field of image generation, especially through the use of cross-attention mechanisms that guide image formation based on textual descriptions. While the analysis of text-guided cross-attention in diffusion models has been extensively studied in recent years, its application in image-to-image diffusion models remains underexplored. This paper introduces the Image-to-Image Attribution Maps I2AM method, which aggregates patch-level cross-attention scores to enhance the interpretability of latent diffusion models across time steps, heads, and attention layers. I2AM facilitates detailed image-to-image attribution analysis, enabling observation of how diffusion models prioritize key features over time and head during the image generation process from reference images. Through extensive experiments, we first visualize the attribution maps of both generated and reference images, verifying that critical information from the reference image is effectively incorporated into the generated image, and vice versa. To further assess our understanding, we introduce a new evaluation metric tailored for reference-based image inpainting tasks. This metric, measuring the consistency between the attribution maps of generated and reference images, shows a strong correlation with established performance metrics for inpainting tasks, validating the potential use of I2AM in future research endeavors.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper, titled "I2AM: Interpreting Image-to-Image Latent Diffusion Models via Attribution Maps," attempts to solve the following problems: 1. **Explaining the image generation process**: Existing research primarily focuses on text-guided diffusion models (text-to-image LDMs), while image-to-image diffusion models (image-to-image LDMs) have been relatively less studied in terms of interpretability. This paper aims to enhance the understanding and interpretation of image-to-image diffusion models by introducing a new method—Image-to-Image Attribution Maps (I2AM). 2. **Identifying key features**: The authors aim to understand how the model progressively forms the shape of objects during the image generation process and which key features (such as printed patterns and logos) it focuses on. By analyzing attribution maps at different time steps, attention heads, and attention layers, one can observe how the model prioritizes these key features during image generation. 3. **Evaluating generation quality**: To further assess the quality of the generated images, the authors introduce a new evaluation metric specifically for image inpainting tasks based on reference images. This metric measures the similarity between the attribution maps of the generated image and the reference image, validating the model's performance in inpainting tasks. ### Specific Problems and Solutions 1. **Relationship between generated image and reference image**: - **Research Question**: Which parts of the generated image are influenced by the reference image? - **Solution**: By merging cross-attention maps to generate attribution maps, one can observe which parts of the generated image are influenced by the reference image. 2. **Key areas of the reference image**: - **Research Question**: Which parts of the reference image are referenced in the generated image? - **Solution**: Through bi-directional attribution maps, not only can one observe the parts of the generated image influenced by the reference image, but also determine which areas of the reference image play a crucial role in the formation of the generated image. 3. **Model interpretability and reliability**: - **Research Question**: How to improve the interpretability and reliability of the model? - **Solution**: By visualizing and analyzing attribution maps, one can better understand the model's decision-making process, thereby improving the model's interpretability and reliability. Additionally, the proposed evaluation metric (IMACS) can help validate the model's performance in practical applications. ### Summary By introducing the I2AM method, this paper addresses the lack of interpretability in image-to-image diffusion models, providing detailed attribution analysis tools that help researchers and users better understand and evaluate the generation process and performance of these models.

I2AM: Interpreting Image-to-Image Latent Diffusion Models via Attribution Maps

Detecting Image Attribution for Text-to-Image Diffusion Models in RGB and Beyond

What the DAAM: Interpreting Stable Diffusion Using Cross Attention

Intriguing Properties of Data Attribution on Diffusion Models

Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Model

Deep Model Transferability from Attribution Maps

Better Understanding Differences in Attribution Methods via Systematic Evaluations

Explaining generative diffusion models via visual analysis for interpretable decision-making process

Towards Effective User Attribution for Latent Diffusion Models via Watermark-Informed Blending

A Survey of Data-Driven 2D Diffusion Models for Generating Images from Text

Unlocking the Potential of Text-to-Image Diffusion with PAC-Bayesian Theory

Evaluating Data Attribution for Text-to-Image Models

Palette: Image-to-Image Diffusion Models

Integrated Gradient Correlation: a Dataset-wise Attribution Method

Unveiling Concept Attribution in Diffusion Models

Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

Learning Propagation Rules for Attribution Map Generation

IIDM: Image-to-Image Diffusion Model for Semantic Image Synthesis