Optimal design for linear models via gradient flow

Ruhui Jin,Martin Guerra,Qin Li,Stephen Wright
2024-06-11
Abstract:Optimal experimental design (OED) aims to choose the observations in an experiment to be as informative as possible, according to certain statistical criteria. In the linear case (when the observations depend linearly on the unknown parameters), it seeks the optimal weights over rows of the design matrix A under certain criteria. Classical OED assumes a discrete design space and thus a design matrix with finite dimensions. In many practical situations, however, the design space is continuous-valued, so that the OED problem is one of optimizing over a continuous-valued design space. The objective becomes a functional over the probability measure, instead of over a finite dimensional vector. This change of perspective requires a new set of techniques that can handle optimizing over probability measures, and Wasserstein gradient flow becomes a natural candidate. Both the first-order criticality and the convexity properties of the OED objective are presented. Computationally Monte Carlo particle simulation is deployed to formulate the main algorithm. This algorithm is applied to two elliptic inverse problems.
Numerical Analysis
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper "Optimal Design for Linear Models via Gradient Flow" aims to solve the problem of optimal experimental design (OED) in continuous design spaces. Specifically, the paper focuses on how to select observation points in continuous design spaces so that these observation points can provide as much information as possible to optimize certain statistical criteria. ### Background and motivation 1. **Traditional OED problems**: - In traditional OED problems, the design space is usually discrete, that is, the design matrix \( A \) is finite - dimensional. The goal is to select the optimal weights \( w \) among these finite observation points to optimize a certain statistical criterion (such as A - optimal or D - optimal). - Mathematically, this can be expressed as an optimization problem: \[ \min_{w} F[w] \quad \text{subject to} \quad w \geq 0, \quad \sum_{i = 1}^m w_i=1 \] where \( F[w] \) represents a certain design criterion, such as A - optimal or D - optimal. 2. **Continuous design spaces**: - However, in many practical applications, the design space is continuous. For example, in medical imaging, climate science, etc., the observation points can be continuous spatial coordinates or angles. - In this case, the design variable \( \theta \) can take continuous values, and the number of rows of the design matrix \( A \) is infinite. Therefore, the optimization problem becomes optimizing the objective function on a continuous probability measure space. - The objective function is now a functional, defined on probability measures rather than finite - dimensional vectors. ### Solutions 1. **Wasserstein gradient flow**: - To deal with the optimization problem on continuous design spaces, the paper introduces the Wasserstein gradient flow method. The Wasserstein gradient flow is an optimization technique defined on the probability measure space and is suitable for dealing with optimization problems on continuous design spaces. - By defining the Wasserstein gradient of the objective function \( F[\rho] \), a partial differential equation (PDE) can be obtained, which describes the evolution process of the probability measure \( \rho \): \[ \frac{\partial \rho}{\partial t}=-\nabla_{W_2} F[\rho]=\nabla_\theta\cdot\left(\rho\nabla_\theta\frac{\delta F[\rho]}{\delta \rho}\right) \] 2. **Particle gradient flow algorithm**: - To numerically solve the above PDE, the paper proposes a method based on particle Monte Carlo simulation. The specific steps are as follows: - Initialize a set of particles \( \{\theta_i\}_{i = 1}^N \) and the initial probability measure \( \rho_0 \). - Update the position of each particle in the direction of gradient descent: \[ \theta_i^{(t)}\leftarrow\theta_i^{(t - 1)}-dt\nabla_\theta\frac{\delta F[\rho_{N}^{(t - 1)}]}{\delta \rho}(\theta_i^{(t - 1)}) \] - Update the probability measure \( \rho_N \) as the average of the particle positions: \[ \rho_N^{(t)}=\frac{1}{N}\sum_{i = 1}^N\delta_{\theta_i^{(t)}}