Learning Mixtures of Gaussians Using the DDPM Objective

Kulin Shah,Sitan Chen,Adam Klivans
2023-07-04
Abstract:Recent works have shown that diffusion models can learn essentially any distribution provided one can perform score estimation. Yet it remains poorly understood under what settings score estimation is possible, let alone when practical gradient-based algorithms for this task can provably succeed. In this work, we give the first provably efficient results along these lines for one of the most fundamental distribution families, Gaussian mixture models. We prove that gradient descent on the denoising diffusion probabilistic model (DDPM) objective can efficiently recover the ground truth parameters of the mixture model in the following two settings: 1) We show gradient descent with random initialization learns mixtures of two spherical Gaussians in $d$ dimensions with $1/\text{poly}(d)$-separated centers. 2) We show gradient descent with a warm start learns mixtures of $K$ spherical Gaussians with $\Omega(\sqrt{\log(\min(K,d))})$-separated centers. A key ingredient in our proofs is a new connection between score-based methods and two other approaches to distribution learning, the EM algorithm and spectral methods.
Data Structures and Algorithms,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in the context of Gaussian Mixture Models (GMMs), whether accurate score estimation can be achieved by optimizing the Denoising Diffusion Probabilistic Model (DDPM) objective function through gradient descent. Specifically, the author aims to prove that under certain conditions, using the DDPM objective function for gradient descent can efficiently learn the true parameters of the Gaussian Mixture Model. ### Problem Background In recent years, Diffusion Models have received extensive attention as a powerful generative modeling framework. The core idea of these models is to learn the distribution through denoising or score estimation (i.e., the gradient of the log - density of the data distribution). DDPM is a commonly used score - matching objective function, which is optimized by minimizing the difference between the predicted noise and the actual noise. However, although much theoretical work has proven the effectiveness of diffusion models under certain assumptions, most of these works rely on the existence of a "oracle" for score estimation and do not clearly show how to provide a provable score - estimation method for interesting distribution families such as Gaussian Mixture Models. Therefore, a key question is: **Are there natural data distributions under which gradient descent can be proven to achieve accurate score estimation?** ### Research Contributions In this paper, the author focuses on the class of Gaussian Mixture Model distributions and proves the following two main results: 1. **Theorem 1 (informal statement)**: For a mixture model of two spherical Gaussians, if the distance between their centers is \( \frac{1}{\text{poly}(d)} \), then starting from a random initialization, gradient descent can efficiently learn the true parameters of the model on the DDPM objective function. 2. **Theorem 2 (informal statement)**: For a mixture model of \( K \) spherical Gaussians, if there is an initial value close to the true center and the distance between the centers is \( \Omega(\sqrt{\log(\min(K,d))}) \), then gradient descent can efficiently learn the true parameters of the model on the DDPM objective function. ### Technical Overview To prove the above results, the author relates the behavior of gradient descent at different noise levels to two classic algorithms - Power Iteration and Expectation - Maximization (EM) algorithm: - **Large noise level**: At a large noise level, the behavior of gradient descent is similar to that of power iteration, which helps to find a solution in the same direction as the true parameters. - **Small noise level**: At a small noise level, the behavior of gradient descent is similar to the M - step update in the EM algorithm, so it can quickly converge to the true parameters. In addition, the author also discusses how to handle smaller separation distances and extend to the general case of \( K \) Gaussian distributions. ### Conclusion The main contribution of this paper is that it provides, for the first time, provable efficiency results of optimizing the DDPM objective function by gradient descent in the context of Gaussian Mixture Models. This not only deepens our understanding of diffusion models but also provides new perspectives and tools for score estimation in practical applications.