Binary Latent Diffusion

Ze Wang,Jiang Wang,Zicheng Liu,Qiang Qiu

2023-04-11

Abstract:In this paper, we show that a binary latent space can be explored for compact yet expressive image representations. We model the bi-directional mappings between an image and the corresponding latent binary representation by training an auto-encoder with a Bernoulli encoding distribution. On the one hand, the binary latent space provides a compact discrete image representation of which the distribution can be modeled more efficiently than pixels or continuous latent representations. On the other hand, we now represent each image patch as a binary vector instead of an index of a learned cookbook as in discrete image representations with vector quantization. In this way, we obtain binary latent representations that allow for better image quality and high-resolution image representations without any multi-stage hierarchy in the latent space. In this binary latent space, images can now be generated effectively using a binary latent diffusion model tailored specifically for modeling the prior over the binary image representations. We present both conditional and unconditional image generation experiments with multiple datasets, and show that the proposed method performs comparably to state-of-the-art methods while dramatically improving the sampling efficiency to as few as 16 steps without using any test-time acceleration. The proposed framework can also be seamlessly scaled to $1024 \times 1024$ high-resolution image generation without resorting to latent hierarchy or multi-stage refinements.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper primarily aims to address the following issues: 1. **Efficiently generating high-quality images**: Investigating how to reduce computational costs in the generation process while maintaining or improving image quality, especially in high-resolution image generation. 2. **Overcoming the limitations of existing models**: Proposing new solutions to address problems in existing generative models (such as GANs, VAEs, etc.), including unstable training and insufficient mode coverage. 3. **Exploring compact and expressive image representation methods**: Using binary latent space to represent images, achieving more efficient distribution modeling and effective image generation. 4. **Improving diffusion models**: Developing diffusion models specifically for binary latent space to better handle image generation tasks, particularly in modeling multivariate Bernoulli distributions. Specifically, the paper proposes a framework based on binary latent space for learning compact binary representations of images and develops a binary latent diffusion model on this basis, capable of generating high-quality images in fewer diffusion steps. Additionally, this method can effectively scale to high-resolution image generation tasks without the need for complex hierarchical structures or multi-stage refinement processes. In summary, the goal of the paper is to improve the quality and efficiency of image generation by introducing a novel binary latent space representation method and corresponding diffusion model, while addressing some limitations of existing models.

Binary Latent Diffusion

Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts

Towards Extreme Image Compression with Latent Feature Guidance and Diffusion Prior

Nested Diffusion Models Using Hierarchical Latent Priors

Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance

Blackout Diffusion: Generative Diffusion Models in Discrete-State Spaces

High-Resolution Image Synthesis with Latent Diffusion Models

Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning

BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

Do Diffusion Models Learn Semantically Meaningful and Efficient Representations?

Binary Noise for Binary Tasks: Masked Bernoulli Diffusion for Unsupervised Anomaly Detection

BiDM: Pushing the Limit of Quantization for Diffusion Models

Transparent Image Layer Diffusion using Latent Transparency

Discrete Modeling via Boundary Conditional Diffusion Processes

Graphusion: Latent Diffusion for Graph Generation

BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models

Variational Diffusion Auto-encoder: Latent Space Extraction from Pre-trained Diffusion Models

Latent-based Diffusion Model for Long-tailed Recognition

Exploring Compositional Visual Generation with Latent Classifier Guidance

Multilevel Diffusion: Infinite Dimensional Score-Based Diffusion Models for Image Generation

Extreme Generative Image Compression by Learning Text Embedding from Diffusion Models