Binary Latent Diffusion

Ze Wang,Jiang Wang,Zicheng Liu,Qiang Qiu
2023-04-11
Abstract:In this paper, we show that a binary latent space can be explored for compact yet expressive image representations. We model the bi-directional mappings between an image and the corresponding latent binary representation by training an auto-encoder with a Bernoulli encoding distribution. On the one hand, the binary latent space provides a compact discrete image representation of which the distribution can be modeled more efficiently than pixels or continuous latent representations. On the other hand, we now represent each image patch as a binary vector instead of an index of a learned cookbook as in discrete image representations with vector quantization. In this way, we obtain binary latent representations that allow for better image quality and high-resolution image representations without any multi-stage hierarchy in the latent space. In this binary latent space, images can now be generated effectively using a binary latent diffusion model tailored specifically for modeling the prior over the binary image representations. We present both conditional and unconditional image generation experiments with multiple datasets, and show that the proposed method performs comparably to state-of-the-art methods while dramatically improving the sampling efficiency to as few as 16 steps without using any test-time acceleration. The proposed framework can also be seamlessly scaled to $1024 \times 1024$ high-resolution image generation without resorting to latent hierarchy or multi-stage refinements.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily aims to address the following issues: 1. **Efficiently generating high-quality images**: Investigating how to reduce computational costs in the generation process while maintaining or improving image quality, especially in high-resolution image generation. 2. **Overcoming the limitations of existing models**: Proposing new solutions to address problems in existing generative models (such as GANs, VAEs, etc.), including unstable training and insufficient mode coverage. 3. **Exploring compact and expressive image representation methods**: Using binary latent space to represent images, achieving more efficient distribution modeling and effective image generation. 4. **Improving diffusion models**: Developing diffusion models specifically for binary latent space to better handle image generation tasks, particularly in modeling multivariate Bernoulli distributions. Specifically, the paper proposes a framework based on binary latent space for learning compact binary representations of images and develops a binary latent diffusion model on this basis, capable of generating high-quality images in fewer diffusion steps. Additionally, this method can effectively scale to high-resolution image generation tasks without the need for complex hierarchical structures or multi-stage refinement processes. In summary, the goal of the paper is to improve the quality and efficiency of image generation by introducing a novel binary latent space representation method and corresponding diffusion model, while addressing some limitations of existing models.