Abstract:We consider the constrained sampling problem where the goal is to sample from a target distribution $\pi(x)\propto e^{-f(x)}$ when $x$ is constrained to lie on a convex body $\mathcal{C}$. Motivated by penalty methods from continuous optimization, we propose penalized Langevin Dynamics (PLD) and penalized underdamped Langevin Monte Carlo (PULMC) methods that convert the constrained sampling problem into an unconstrained sampling problem by introducing a penalty function for constraint violations. When $f$ is smooth and gradients are available, we get $\tilde{\mathcal{O}}(d/\varepsilon^{10})$ iteration complexity for PLD to sample the target up to an $\varepsilon$-error where the error is measured in the TV distance and $\tilde{\mathcal{O}}(\cdot)$ hides logarithmic factors. For PULMC, we improve the result to $\tilde{\mathcal{O}}(\sqrt{d}/\varepsilon^{7})$ when the Hessian of $f$ is Lipschitz and the boundary of $\mathcal{C}$ is sufficiently smooth. To our knowledge, these are the first convergence results for underdamped Langevin Monte Carlo methods in the constrained sampling that handle non-convex $f$ and provide guarantees with the best dimension dependency among existing methods with deterministic gradient. If unbiased stochastic estimates of the gradient of $f$ are available, we propose PSGLD and PSGULMC methods that can handle stochastic gradients and are scaleable to large datasets without requiring Metropolis-Hasting correction steps. For PSGLD and PSGULMC, when $f$ is strongly convex and smooth, we obtain $\tilde{\mathcal{O}}(d/\varepsilon^{18})$ and $\tilde{\mathcal{O}}(d\sqrt{d}/\varepsilon^{39})$ iteration complexity in W2 distance. When $f$ is smooth and can be non-convex, we provide finite-time performance bounds and iteration complexity results. Finally, we illustrate the performance on Bayesian LASSO regression and Bayesian constrained deep learning problems.

Convergence Acceleration of Markov Chain Monte Carlo-based Gradient Descent by Deep Unfolding

Accelerating Convergence of Stein Variational Gradient Descent via Deep Unfolding

Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling

Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling

Convergence Rates of Accelerated Markov Gradient Descent with Applications in Reinforcement Learning

On Markov Chain Gradient Descent

Stagewise Accelerated Stochastic Gradient Methods for Nonconvex Optimization

Fast Unconstrained Optimization via Hessian Averaging and Adaptive Gradient Sampling Methods

Convergence Analysis of a Quasi-Monte Carlo-based Deep Learning Algorithm for Solving Partial Differential Equations

An adaptive Hessian approximated stochastic gradient MCMC method

Improving sample efficiency of high dimensional Bayesian optimization with MCMC

Accelerated Forward-Backward Optimization Using Deep Learning

Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems

Penalized Overdamped and Underdamped Langevin Monte Carlo Algorithms for Constrained Sampling

Decentralized Stochastic Gradient Descent Ascent for Finite-Sum Minimax Problems

Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte Carlo

Derivative-Free Optimization via Adaptive Sampling Strategies

Quasi-Monte Carlo sampling for machine-learning partial differential equations

Optimal Adaptive and Accelerated Stochastic Gradient Descent

MUSIC: Accelerated Convergence for Distributed Optimization With Inexact and Exact Methods

Accelerated Doubly Stochastic Gradient Algorithm for Large-scale Empirical Risk Minimization