Abstract:We present a mathematical framework and computational methods to optimally design a finite number of sequential experiments. We formulate this sequential optimal experimental design (sOED) problem as a finite-horizon partially observable Markov decision process (POMDP) in a Bayesian setting and with information-theoretic utilities. It is built to accommodate continuous random variables, general non-Gaussian posteriors, and expensive nonlinear forward models. sOED then seeks an optimal design policy that incorporates elements of both feedback and lookahead, generalizing the suboptimal batch and greedy designs. We solve for the sOED policy numerically via policy gradient (PG) methods from reinforcement learning, and derive and prove the PG expression for sOED. Adopting an actor-critic approach, we parameterize the policy and value functions using deep neural networks and improve them using gradient estimates produced from simulated episodes of designs and observations. The overall PG-sOED method is validated on a linear-Gaussian benchmark, and its advantages over batch and greedy designs are demonstrated through a contaminant source inversion problem in a convection-diffusion field.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the design problem of a series of optimal experiments in nonlinear models. Specifically, the paper proposes a mathematical framework and computational method for optimizing a finite number of continuous experimental designs in a Bayesian setting. These problems are formulated as a partially observable Markov decision process (POMDP) in a finite - time horizon, using an information - theoretic utility function to measure the value of the experiment. This method can handle continuous random variables, general non - Gaussian posterior distributions, and expensive nonlinear forward models. The main contributions of the paper are as follows: 1. **Problem Formulation**: Formulate the sequential optimal experimental design (sOED) problem as a finite - time - horizon POMDP in a Bayesian setting, applicable to continuous random variables, and demonstrate its generalization ability compared to batch and greedy designs. 2. **Algorithm Proposal**: Propose a policy - gradient (PG) - based sOED algorithm (called PG - sOED), derive and prove the key gradient expressions, and propose its Monte Carlo estimator. In addition, introduce the deep neural network (DNN) architectures for the policy and value functions, and describe in detail the numerical settings of the entire method. 3. **Performance Verification**: Verify the speed and optimality advantages of PG - sOED through a linear - Gaussian benchmark test and an inverse problem of pollution source in a convection - diffusion field, which involve expensive forward models. ### Main Contributions 1. **Problem Formulation**: - Formulate the sOED problem as a finite - time - horizon POMDP in a Bayesian setting, applicable to continuous random variables. - Demonstrate the generalization ability of sOED compared to batch and greedy designs. 2. **Algorithm Proposal**: - Propose a policy - gradient (PG) - based sOED algorithm (PG - sOED). - Derive and prove the key gradient expressions and propose its Monte Carlo estimator. - Use deep neural networks (DNN) to parameterize and approximate the policy and value functions. - Adopt the actor - critic method to explicitly represent and learn the policy, thereby allowing the use of gradient - based optimization algorithms. 3. **Performance Verification**: - Verify the effectiveness of PG - sOED through a linear - Gaussian benchmark test. - Demonstrate the advantages of PG - sOED compared to batch and greedy designs through an inverse problem of pollution source in a convection - diffusion field. ### Mathematical Formulas - **Bayesian Update Formula**: \[ p(\theta | d_k, y_k, I_k) = \frac{p(y_k | \theta, d_k, I_k) p(\theta | I_k)}{p(y_k | d_k, I_k)} \] where \( I_k=\{d_0, y_0, \ldots, d_{k - 1}, y_{k - 1}\} \) is all experimental designs and observation records before the \( k \) - th experiment. - **KL Divergence as a Reward Function**: - Terminal Reward Form: \[ g_N(x_N)=D_{\text{KL}}(p(\cdot | I_N)\|p(\cdot | I_0))=\int_\Theta p(\theta | I_N)\ln\left(\frac{p(\theta | I_N)}{p(\theta | I_0)}\right)d\theta \] - Incremental Reward Form: \[ g_k(x_k, d_k, y_k)=D_{\text{KL}}(p(\cdot | I_{k + 1})\|p(\cdot | I_k))=\int_\Theta p(\theta | I_{k + 1})\ln\left(\frac{p(\theta | I_{k + 1})}{p(\theta | I_k)}\right)d\theta \] - **Policy Gradient Expression**: \[

Bayesian Sequential Optimal Experimental Design for Nonlinear Models Using Policy Gradient Reinforcement Learning

Variational Sequential Optimal Experimental Design using Reinforcement Learning

Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning

Policy-Based Bayesian Experimental Design for Non-Differentiable Implicit Models

Statistically Efficient Bayesian Sequential Experiment Design via Reinforcement Learning with Cross-Entropy Estimators

Myopic Bayesian Design of Experiments via Posterior Sampling and Probabilistic Programming

Sequential infinite-dimensional Bayesian optimal experimental design with derivative-informed latent attention neural operator

Goal-Oriented Bayesian Optimal Experimental Design for Nonlinear Models using Markov Chain Monte Carlo

Stochastic Cubic-Regularized Policy Gradient Method

BINOCULARS for Efficient, Nonmyopic Sequential Experimental Design

Accurate, scalable, and efficient Bayesian optimal experimental design with derivative-informed neural operators

Sequential Bayesian experimental designs via reinforcement learning

Bayesian Sequential Experimental Design for a Partially Linear Model with a Gaussian Process Prior

PASOA- PArticle baSed Bayesian Optimal Adaptive design

Stochastic Gradient Bayesian Optimal Experimental Designs for Simulation-based Inference

Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design

Learning Arbitrary Quantities of Interest from Expensive Black-Box Functions through Bayesian Sequential Optimal Design

Optimizing Sequential Medical Treatments with Auto-Encoding Heuristic Search in POMDPs

Optimal design for linear models via gradient flow

Sequential Experimental Design for X-Ray CT Using Deep Reinforcement Learning

Bayesian Experimental Design via Contrastive Diffusions