Abstract:This paper introduces Bayesian Flow Networks (BFNs), a new class of generative model in which the parameters of a set of independent distributions are modified with Bayesian inference in the light of noisy data samples, then passed as input to a neural network that outputs a second, interdependent distribution. Starting from a simple prior and iteratively updating the two distributions yields a generative procedure similar to the reverse process of diffusion models; however it is conceptually simpler in that no forward process is required. Discrete and continuous-time loss functions are derived for continuous, discretised and discrete data, along with sample generation procedures. Notably, the network inputs for discrete data lie on the probability simplex, and are therefore natively differentiable, paving the way for gradient-based sample guidance and few-step generation in discrete domains such as language modelling. The loss function directly optimises data compression and places no restrictions on the network architecture. In our experiments BFNs achieve competitive log-likelihoods for image modelling on dynamically binarized MNIST and CIFAR-10, and outperform all known discrete diffusion models on the text8 character-level language modelling task.
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is to develop a new generative model, namely Bayesian Flow Networks (BFNs), in order to improve the performance of existing generative models when dealing with continuous and discrete data. Specifically, BFNs aim to overcome the problem that diffusion models perform poorly on discrete data and provide a simpler and more effective generation process.
### Main problems and goals
1. **Generation effect on discrete data**:
- Diffusion models face challenges when dealing with discrete data (such as text), because the noise of discrete data is discontinuous, which makes the generation process difficult to optimize.
- BFNs solve this problem by parameterizing the input distribution as a continuous probability distribution (even if the data itself is discrete), making the entire generation process completely continuous and differentiable.
2. **Simplification of the generation process**:
- Compared with diffusion models, BFNs do not need to define a forward process, which makes the model more adaptable to different data types and distributions.
- The generation process of BFNs is similar to the inverse process of diffusion models, but it is conceptually simpler because it starts directly from a simple prior distribution and gradually updates two distributions without the need to explicitly define the interactions between all variables.
3. **Optimization of data compression and generation efficiency**:
- The loss function of BFNs directly optimizes data compression and has no restrictions on the network architecture, enabling it to better adapt to different types of generation tasks.
- For discrete data, the input of BFNs is located on the probability simplex, so sample guidance and generation in a small number of steps can be carried out by gradient methods, improving the generation efficiency.
### Key innovation points
- **Bayesian update mechanism**: BFNs use Bayesian inference to update the parameters of the input distribution, ensuring that the information transfer in the generation process is mathematically optimal.
- **Continuous - time loss function**: By generalizing the discrete - time loss function to continuous - time, BFNs can remove the predefined requirement for the number of steps during the training process and simplify the calculation.
- **Flexibility and universality**: BFNs can be flexibly applied to continuous, discretized, and discrete data with only a few modifications to the training process.
### Experimental results
In the experimental part, BFNs have achieved competitive results in image modeling (such as the dynamically binarized MNIST and CIFAR - 10 datasets) and character - level language modeling (such as the text8 dataset), especially outperforming the known discrete diffusion models on discrete data.
In conclusion, this paper aims to provide a more efficient and flexible generative model framework by introducing BFNs, especially suitable for discrete data generation tasks.