Unlocking Guidance for Discrete State-Space Diffusion and Flow Models

Hunter Nisonoff,Junhao Xiong,Stephan Allenspach,Jennifer Listgarten
2024-10-10
Abstract:Generative models on discrete state-spaces have a wide range of potential applications, particularly in the domain of natural sciences. In continuous state-spaces, controllable and flexible generation of samples with desired properties has been realized using guidance on diffusion and flow models. However, these guidance approaches are not readily amenable to discrete state-space models. Consequently, we introduce a general and principled method for applying guidance on such models. Our method depends on leveraging continuous-time Markov processes on discrete state-spaces, which unlocks computational tractability for sampling from a desired guided distribution. We demonstrate the utility of our approach, Discrete Guidance, on a range of applications including guided generation of small-molecules, DNA sequences and protein sequences.
Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the guidance problem of diffusion models and flow models in discrete state spaces. Specifically, the author proposes a method named **Discrete Guidance** to achieve controllable and flexible sample generation in discrete state spaces. #### Background and Challenges 1. **Limitations of Existing Methods**: - In continuous state spaces, sample generation with desired properties can already be achieved through guidance techniques. However, these guidance methods are not easily applicable to discrete state space models. - The diffusion process in discrete state spaces depends on the probability density gradient (i.e., the score function), and in the discrete state, this gradient is undefined. - Flow matching is usually achieved through linear interpolation, but it performs poorly in discrete state spaces and requires alternative methods. 2. **Applications of Discrete State Spaces**: - Discrete state spaces are widely used in natural language processing, biological sequences (such as DNA/RNA/proteins), molecular graphs, etc. - In these fields, conditional generation tasks are very important, such as generating protein sequences according to specific structures or properties. #### Solution: Discrete Guidance - **Core Idea**: Use continuous - time Markov processes (CTMCs) to unlock the guidance computation in discrete state spaces. - **Specific Methods**: - Guidance is achieved by adjusting the transition rate matrix \( R_t(x, \tilde{x} | y) \), and the formula is as follows: \[ R_t(x, \tilde{x} | y) = \frac{p(y | \tilde{x}, t)}{p(y | x, t)} R_t(x, \tilde{x}) \] - For autoregressive - guided (predictor - guided), the guidance strength \( \gamma \) can be introduced, and the formula becomes: \[ R_t^{(\gamma)}(x, \tilde{x} | y) = \left( \frac{p(y | \tilde{x}, t)}{p(y | x, t)} \right)^\gamma R_t(x, \tilde{x}) \] - For non - autoregressive - guided (predictor - free - guided), the following formula is used: \[ R_t^{(\gamma)}(x, \tilde{x} | y) = R_t(x, \tilde{x} | y)^\gamma R_t(x, \tilde{x})^{1-\gamma} \] #### Experimental Verification - **Scope of Application**: The author conducted experiments in multiple fields, including small - molecule generation, DNA sequence generation, and protein sequence generation. - **Results**: Experiments show that the discrete guidance method can effectively generate samples that meet the specified conditions, and in most cases, its performance is better than the existing discrete - time discrete diffusion guidance methods (such as DiGress). #### Summary The main contribution of this paper is to provide a theoretically rigorous and general framework - Discrete Guidance for the guidance of diffusion models and flow models in discrete state spaces. This method is not only applicable to continuous - time diffusion and flow models but can also be extended to a wider range of CTMCs - based generative models.