Abstract:Generative models on discrete state-spaces have a wide range of potential applications, particularly in the domain of natural sciences. In continuous state-spaces, controllable and flexible generation of samples with desired properties has been realized using guidance on diffusion and flow models. However, these guidance approaches are not readily amenable to discrete state-space models. Consequently, we introduce a general and principled method for applying guidance on such models. Our method depends on leveraging continuous-time Markov processes on discrete state-spaces, which unlocks computational tractability for sampling from a desired guided distribution. We demonstrate the utility of our approach, Discrete Guidance, on a range of applications including guided generation of small-molecules, DNA sequences and protein sequences.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve the guidance problem of diffusion models and flow models in discrete state spaces. Specifically, the author proposes a method named **Discrete Guidance** to achieve controllable and flexible sample generation in discrete state spaces.
#### Background and Challenges
1. **Limitations of Existing Methods**:
- In continuous state spaces, sample generation with desired properties can already be achieved through guidance techniques. However, these guidance methods are not easily applicable to discrete state space models.
- The diffusion process in discrete state spaces depends on the probability density gradient (i.e., the score function), and in the discrete state, this gradient is undefined.
- Flow matching is usually achieved through linear interpolation, but it performs poorly in discrete state spaces and requires alternative methods.
2. **Applications of Discrete State Spaces**:
- Discrete state spaces are widely used in natural language processing, biological sequences (such as DNA/RNA/proteins), molecular graphs, etc.
- In these fields, conditional generation tasks are very important, such as generating protein sequences according to specific structures or properties.
#### Solution: Discrete Guidance
- **Core Idea**: Use continuous - time Markov processes (CTMCs) to unlock the guidance computation in discrete state spaces.
- **Specific Methods**:
- Guidance is achieved by adjusting the transition rate matrix \( R_t(x, \tilde{x} | y) \), and the formula is as follows:
\[
R_t(x, \tilde{x} | y) = \frac{p(y | \tilde{x}, t)}{p(y | x, t)} R_t(x, \tilde{x})
\]
- For autoregressive - guided (predictor - guided), the guidance strength \( \gamma \) can be introduced, and the formula becomes:
\[
R_t^{(\gamma)}(x, \tilde{x} | y) = \left( \frac{p(y | \tilde{x}, t)}{p(y | x, t)} \right)^\gamma R_t(x, \tilde{x})
\]
- For non - autoregressive - guided (predictor - free - guided), the following formula is used:
\[
R_t^{(\gamma)}(x, \tilde{x} | y) = R_t(x, \tilde{x} | y)^\gamma R_t(x, \tilde{x})^{1-\gamma}
\]
#### Experimental Verification
- **Scope of Application**: The author conducted experiments in multiple fields, including small - molecule generation, DNA sequence generation, and protein sequence generation.
- **Results**: Experiments show that the discrete guidance method can effectively generate samples that meet the specified conditions, and in most cases, its performance is better than the existing discrete - time discrete diffusion guidance methods (such as DiGress).
#### Summary
The main contribution of this paper is to provide a theoretically rigorous and general framework - Discrete Guidance for the guidance of diffusion models and flow models in discrete state spaces. This method is not only applicable to continuous - time diffusion and flow models but can also be extended to a wider range of CTMCs - based generative models.