Diffusion Models With Learned Adaptive Noise

Subham Sekhar Sahoo,Aaron Gokaslan,Chris De Sa,Volodymyr Kuleshov
2024-06-05
Abstract:Diffusion models have gained traction as powerful algorithms for synthesizing high-quality images. Central to these algorithms is the diffusion process, a set of equations which maps data to noise in a way that can significantly affect performance. In this paper, we explore whether the diffusion process can be learned from data. Our work is grounded in Bayesian inference and seeks to improve log-likelihood estimation by casting the learned diffusion process as an approximate variational posterior that yields a tighter lower bound (ELBO) on the likelihood. A widely held assumption is that the ELBO is invariant to the noise process: our work dispels this assumption and proposes multivariate learned adaptive noise (MULAN), a learned diffusion process that applies noise at different rates across an image. Specifically, our method relies on a multivariate noise schedule that is a function of the data to ensure that the ELBO is no longer invariant to the choice of the noise schedule as in previous works. Empirically, MULAN sets a new state-of-the-art in density estimation on CIFAR-10 and ImageNet and reduces the number of training steps by 50%. Code is available at <a class="link-external link-https" href="https://github.com/s-sahoo/MuLAN" rel="external noopener nofollow">this https URL</a>
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the performance of the model in the generative model by learning the diffusion process from data. Specifically, the authors explored whether the log - likelihood estimation of generative tasks based on diffusion models can be improved by learning the noise - adding method in the diffusion process. Traditional diffusion models usually adopt fixed or hyper - parameter - based noise - scheduling schemes, which may not fully utilize the characteristics of the data set, thus affecting the performance of the model. This paper proposes a new diffusion process - Multivariate Learned Adaptive Noise (MULAN), which aims to adaptively add Gaussian noise at different rates in different parts of the image in a learned manner, thereby breaking the previous assumption that the Evidence Lower Bound (ELBO) is invariant to the choice of the noise process, and shows in experiments that this method can significantly improve the density - estimation performance of the model while reducing the time required for training. The main contributions of the paper include: 1. Pointing out that the ELBO of the diffusion model is not invariant to all types of noise, thus overturning a common assumption in this field. 2. Introducing MULAN, a learned noise process that can adaptively add multivariate Gaussian noise at different rates in different positions of the image according to the context (including the image itself). 3. Experiments prove that learning the diffusion process not only accelerates training, but also reaches the previous state - of - the - art level with less than half of the computing resources, and achieves a new state - of - the - art level in density estimation on CIFAR - 10 and ImageNet.