Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering

Peng Wang,Huijie Zhang,Zekai Zhang,Siyi Chen,Yi Ma,Qing Qu

2024-09-04

Abstract:Recent empirical studies have demonstrated that diffusion models can effectively learn the image distribution and generate new samples. Remarkably, these models can achieve this even with a small number of training samples despite a large image dimension, circumventing the curse of dimensionality. In this work, we provide theoretical insights into this phenomenon by leveraging key empirical observations: (i) the low intrinsic dimensionality of image data, (ii) a union of manifold structure of image data, and (iii) the low-rank property of the denoising autoencoder in trained diffusion models. These observations motivate us to assume the underlying data distribution of image data as a mixture of low-rank Gaussians and to parameterize the denoising autoencoder as a low-rank model according to the score function of the assumed distribution. With these setups, we rigorously show that optimizing the training loss of diffusion models is equivalent to solving the canonical subspace clustering problem over the training samples. Based on this equivalence, we further show that the minimal number of samples required to learn the underlying distribution scales linearly with the intrinsic dimensions under the above data and model assumptions. This insight sheds light on why diffusion models can break the curse of dimensionality and exhibit the phase transition in learning distributions. Moreover, we empirically establish a correspondence between the subspaces and the semantic representations of image data, facilitating image editing. We validate these results with corroborated experimental results on both simulated distributions and image datasets.

Machine Learning,Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### The Problem the Paper Attempts to Solve This paper attempts to address the following problem: **How can diffusion models learn the latent data distribution without being affected by the curse of dimensionality?** Specifically, the paper explores this problem through the following key observations: 1. **Low-dimensional characteristics of image data**: The intrinsic dimension of real image data is much lower than its ambient dimension. 2. **Manifold structure of image data**: Image data is distributed on a collection of manifolds of different dimensions. 3. **Low-rank nature of denoising autoencoders**: The denoising autoencoders in trained diffusion models exhibit a low-rank structure. Based on these observations, the authors hypothesize that the image data distribution is a low-rank Gaussian mixture (MoLRG) and parameterize the score function of this distribution through denoising autoencoders. On this basis, the paper demonstrates that optimizing the training loss of diffusion models is equivalent to solving a subspace clustering problem and further proves that the required number of samples is linearly related to the intrinsic dimension of the data. This finding explains why diffusion models can overcome the curse of dimensionality and shows a phase transition phenomenon from failure to success in learning the distribution. Additionally, the authors discover that the low-dimensional subspaces found in pre-trained diffusion models have semantic meaning, providing a training-free method for image editing. Experimental results validate these theoretical assumptions and conclusions. In summary, this paper provides a new perspective for understanding how diffusion models effectively learn complex data distributions through theoretical analysis and experimental validation.

Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering

Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data

Exploring Low-Dimensional Subspaces in Diffusion Models for Controllable Image Editing

Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts

Diffusion Model for Generative Image Denoising

Adapting to Unknown Low-Dimensional Structures in Score-Based Diffusion Models

Interpretable Diffusion via Information Decomposition

Diffusion Models in Low-Level Vision: A Survey

Unmasking Bias in Diffusion Model Training

Convergence of Diffusion Models Under the Manifold Hypothesis in High-Dimensions

How Diffusion Models Learn to Factorize and Compose

Shallow diffusion networks provably learn hidden low-dimensional structure

The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation

Diffusion Probabilistic Fields

Diffusion Models in Vision: A Survey

Do Diffusion Models Learn Semantically Meaningful and Efficient Representations?

Efficacy of the maternal height to fundal height ratio in predicting arrest of labor disorders.

Isometric Representation Learning for Disentangled Latent Space of Diffusion Models

Diffusion Models With Learned Adaptive Noise

Image Neural Field Diffusion Models

Stimulating Diffusion Model for Image Denoising via Adaptive Embedding and Ensembling