Abstract:Posterior sampling in contextual bandits with a Gaussian prior can be implemented exactly or approximately using the Laplace approximation. The Gaussian prior is computationally efficient but it cannot describe complex distributions. In this work, we propose approximate posterior sampling algorithms for contextual bandits with a diffusion model prior. The key idea is to sample from a chain of approximate conditional posteriors, one for each stage of the reverse process, which are estimated in a closed form using the Laplace approximation. Our approximations are motivated by posterior sampling with a Gaussian prior, and inherit its simplicity and efficiency. They are asymptotically consistent and perform well empirically on a variety of contextual bandit problems.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to use the diffusion model as a prior for approximate posterior sampling in contextual bandits. Although the traditional Gaussian prior is computationally efficient, it cannot describe complex distributions, especially when it is necessary to represent multimodal distributions. Therefore, this paper proposes a new posterior sampling algorithm based on the diffusion model prior, aiming to overcome the limitations of the Gaussian prior and show good performance on various contextual bandit problems. ### Main Contributions: 1. **Algorithm Innovation**: Proposed new posterior sampling approximation methods applicable to linear models and generalized linear models (GLMs). These methods are achieved by sampling from a series of approximate conditional posteriors, and each conditional posterior is a closed - form estimated by Laplace approximation. 2. **Theoretical Contribution**: Proved that the proposed posterior approximation method is asymptotically consistent. Specifically, as the number of observations increases, the conditional posterior concentrates on a scaled version of the unknown model parameters. 3. **Experimental Verification**: Conducted experimental verification on various contextual bandit problems, demonstrating the effectiveness and robustness of the proposed method, especially outperforming existing methods when dealing with complex priors. ### Method Overview: - **Linear Model**: At each stage, the conditional posterior is the product of two Gaussian distributions, representing prior knowledge and diffusion evidence respectively. - **Generalized Linear Model**: Through Laplace approximation, mix prior knowledge and evidence to obtain the conditional posterior at each stage. - **Diffusion Model**: Use the diffusion model as a prior and generate and sample data through the forward process and the reverse process. ### Experimental Results: - **Synthetic Experiments**: On three synthetic problems, the performance of the proposed method (DiffTS) is better than that of the baseline methods, especially when dealing with multimodal priors. - **MovieLens Experiments**: In the recommendation system task, the proposed method shows significantly lower cumulative regret in both linear and logistic regression multi - armed bandit problems. ### Related Work: - **Multi - armed Bandits with Diffusion Models**: Hsieh et al. proposed the Thompson sampling method for K - armed bandits, while this paper extends to more general contextual multi - armed bandits and develops new posterior sampling approximation methods. - **Posterior Sampling in Diffusion Models**: Chung et al. proposed the DPS method, which samples the posterior by adding the gradient of the observation likelihood, but this method becomes unstable as the number of observations increases. The method in this paper avoids this problem by approximating through the product of the prior and evidence distributions. In conclusion, this paper proposes a new posterior sampling method by introducing the diffusion model prior, which not only theoretically proves its asymptotic consistency but also experimentally verifies its effectiveness and robustness.

Online Posterior Sampling with a Diffusion Prior

Langevin Monte Carlo for Contextual Bandits

Posterior Sampling in High Dimension via Diffusion Processes

Posterior sampling via Langevin dynamics based on generative priors

Efficient and Adaptive Posterior Sampling Algorithms for Bandits

Posterior sampling with Adaptive Gaussian Processes in Bayesian parameter identification

Diffusion Posterior Sampling for General Noisy Inverse Problems

Diffusion Posterior Sampling is Computationally Intractable

PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits

Learning to Optimize via Posterior Sampling

Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent

Scalable Nonparametric Sampling from Multimodal Posteriors with the Posterior Bootstrap

Gaussian is All You Need: A Unified Framework for Solving Inverse Problems via Diffusion Posterior Sampling

Amortized Posterior Sampling with Diffusion Prior Distillation

Thompson Sampling for Stochastic Bandits with Noisy Contexts: An Information-Theoretic Regret Analysis

Optimistic Information Directed Sampling

Thompson Sampling in Partially Observable Contextual Bandits

Divide-and-Conquer Posterior Sampling for Denoising Diffusion Priors

Continuous Gaussian mixture solution for linear Bayesian inversion with application to Laplace priors

Conditional sampling within generative diffusion models

Efficient sampling for Gaussian linear regression with arbitrary priors