Abstract:We study $\textit{how}$ rich visual semantic information is represented within various layers and denoising timesteps of different diffusion architectures. We uncover monosemantic interpretable features by leveraging k-sparse autoencoders (k-SAE). We substantiate our mechanistic interpretations via transfer learning using light-weight classifiers on off-the-shelf diffusion models' features. On $4$ datasets, we demonstrate the effectiveness of diffusion features for representation learning. We provide in-depth analysis of how different diffusion architectures, pre-training datasets, and language model conditioning impacts visual representation granularity, inductive biases, and transfer learning capabilities. Our work is a critical step towards deepening interpretability of black-box diffusion models. Code and visualizations available at: <a class="link-external link-https" href="https://github.com/revelio-diffusion/revelio" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the problem of how to understand and interpret the internal representations of diffusion models. Specifically, the author focuses on the following key issues: 1. **Representation of visual - semantic information**: - What types of visual information are captured by the diffusion model in different layers and denoising time steps? - How do these pieces of information interact and supplement the overall learned visual information? 2. **Influence of external conditions**: - Do different layers benefit differently from external conditions (such as conditional inputs from language models)? Why? 3. **Influence of model architecture and training data**: - How do different diffusion model architectures (for example, convolution - based models vs. Transformer - based models) and pre - training datasets affect the granularity of visual representations, inductive biases, and transfer - learning capabilities? 4. **Improving interpretability**: - How can mechanistic interpretation techniques be used to reveal the visual knowledge of the internal states of diffusion models and make these black - box models more interpretable? To answer these questions, the author adopts a technique called k - sparse autoencoders (k - SAE), verifies their findings by training lightweight classifiers, and analyzes the experimental results on multiple datasets. In addition, the author also explores different model architectures (such as different versions of Stable Diffusion, DeepFloyd - IF, etc.) and the differences in pixel space and latent space. ### Main contributions 1. **Proposed a new interpretation method**: Use k - SAE to reveal the representation of single - semantic features in diffusion models. 2. **Verified the change in the granularity of visual information**: Showed the change in the granularity of visual information in different layers and time steps and explained the reasons. 3. **Designed a lightweight classifier Diff - C**: This classifier performs well on multiple tasks and does not require additional loss functions or complex training processes. 4. **Provided detailed experimental analysis**: Through extensive experiments on multiple datasets and model architectures, proved the effectiveness and universality of diffusion features. In summary, this paper provides an important step in understanding the internal working principles of diffusion models and provides theoretical support for further designing more efficient algorithms.

$\textit{Revelio}$: Interpreting and leveraging semantic information in diffusion models

Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts

Interpretable Diffusion via Information Decomposition

Explaining generative diffusion models via visual analysis for interpretable decision-making process

Do text-free diffusion models learn discriminative visual representations?

Diffusion Models and Representation Learning: A Survey

The Hidden Language of Diffusion Models

Diffusion Model as Representation Learner

Diffusion Models already have a Semantic Latent Space

EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models

Training Diffusion Models with Reinforcement Learning

SODA: Bottleneck Diffusion Models for Representation Learning

Diffusion Features to Bridge Domain Gap for Semantic Segmentation

Discovering interpretable models of scientific image data with deep learning

DISCOVER: Making Vision Networks Interpretable via Competition and Dissection

Are Diffusion Models Vision-And-Language Reasoners?

Diffusion Models in Vision: A Survey

Efficacy of the maternal height to fundal height ratio in predicting arrest of labor disorders.

Large-scale Reinforcement Learning for Diffusion Models

Exploring Behavior-Relevant and Disentangled Neural Dynamics with Generative Diffusion Models

What the DAAM: Interpreting Stable Diffusion Using Cross Attention