$\textit{Revelio}$: Interpreting and leveraging semantic information in diffusion models

Dahye Kim,Xavier Thomas,Deepti Ghadiyaram
DOI: https://doi.org/10.48550/arXiv.2411.16725
2024-11-23
Abstract:We study $\textit{how}$ rich visual semantic information is represented within various layers and denoising timesteps of different diffusion architectures. We uncover monosemantic interpretable features by leveraging k-sparse autoencoders (k-SAE). We substantiate our mechanistic interpretations via transfer learning using light-weight classifiers on off-the-shelf diffusion models' features. On $4$ datasets, we demonstrate the effectiveness of diffusion features for representation learning. We provide in-depth analysis of how different diffusion architectures, pre-training datasets, and language model conditioning impacts visual representation granularity, inductive biases, and transfer learning capabilities. Our work is a critical step towards deepening interpretability of black-box diffusion models. Code and visualizations available at: <a class="link-external link-https" href="https://github.com/revelio-diffusion/revelio" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of how to understand and interpret the internal representations of diffusion models. Specifically, the author focuses on the following key issues: 1. **Representation of visual - semantic information**: - What types of visual information are captured by the diffusion model in different layers and denoising time steps? - How do these pieces of information interact and supplement the overall learned visual information? 2. **Influence of external conditions**: - Do different layers benefit differently from external conditions (such as conditional inputs from language models)? Why? 3. **Influence of model architecture and training data**: - How do different diffusion model architectures (for example, convolution - based models vs. Transformer - based models) and pre - training datasets affect the granularity of visual representations, inductive biases, and transfer - learning capabilities? 4. **Improving interpretability**: - How can mechanistic interpretation techniques be used to reveal the visual knowledge of the internal states of diffusion models and make these black - box models more interpretable? To answer these questions, the author adopts a technique called k - sparse autoencoders (k - SAE), verifies their findings by training lightweight classifiers, and analyzes the experimental results on multiple datasets. In addition, the author also explores different model architectures (such as different versions of Stable Diffusion, DeepFloyd - IF, etc.) and the differences in pixel space and latent space. ### Main contributions 1. **Proposed a new interpretation method**: Use k - SAE to reveal the representation of single - semantic features in diffusion models. 2. **Verified the change in the granularity of visual information**: Showed the change in the granularity of visual information in different layers and time steps and explained the reasons. 3. **Designed a lightweight classifier Diff - C**: This classifier performs well on multiple tasks and does not require additional loss functions or complex training processes. 4. **Provided detailed experimental analysis**: Through extensive experiments on multiple datasets and model architectures, proved the effectiveness and universality of diffusion features. In summary, this paper provides an important step in understanding the internal working principles of diffusion models and provides theoretical support for further designing more efficient algorithms.