Convergence of Langevin-simulated annealing algorithms with multiplicative noise

Pierre Bras,Gilles Pagès
DOI: https://doi.org/10.1090/mcom/3899
IF: 2.118
2024-03-16
Mathematics of Computation
Abstract:We study the convergence of Langevin-Simulated Annealing type algorithms with multiplicative noise, i.e. for V : R d → R V : \mathbb {R}^d o \mathbb {R} a potential function to minimize, we consider the stochastic differential equation d Y t = − σ σ ⊤ ∇ V ( Y t ) dY_t = - \sigma \sigma ^ op abla V(Y_t) d t + a ( t ) σ ( Y t ) d W t + a ( t ) 2 Υ ( Y t ) d t dt + a(t)\sigma (Y_t)dW_t + a(t)^2\Upsilon (Y_t)dt , where ( W t ) (W_t) is a Brownian motion, where σ : R d → M d ( R ) \sigma : \mathbb {R}^d o \mathcal {M}_d(\mathbb {R}) is an adaptive (multiplicative) noise, where a : R + → R + a : \mathbb {R}^+ o \mathbb {R}^+ is a function decreasing to 0 0 and where Υ \Upsilon is a correction term. This setting can be applied to optimization problems arising in Machine Learning; allowing σ \sigma to depend on the position brings faster convergence in comparison with the classical Langevin equation d Y t = − ∇ V ( Y t ) d t + σ d W t dY_t = - abla V(Y_t)dt + \sigma dW_t . The case where σ \sigma is a constant matrix has been extensively studied; however little attention has been paid to the general case. We prove the convergence for the L 1 L^1 -Wasserstein distance of Y t Y_t and of the associated Euler scheme Y ̄ t \bar {Y}_t to some measure ν ⋆ u ^\star which is supported by argmin ⁡ ( V ) \operatorname {argmin}(V) and give rates of convergence to the instantaneous Gibbs measure ν a ( t ) u _{a(t)} of density ∝ exp ⁡ ( − 2 V ( x ) / a ( t ) 2 ) \propto \exp (-2V(x)/a(t)^2) . To do so, we first consider the case where a a is a piecewise constant function. We find again the classical schedule a ( t ) = A log − 1 / 2 ⁡ ( t ) a(t) = A\log ^{-1/2}(t) . We then prove the convergence for the general case by giving bounds for the Wasserstein distance to the stepwise constant case using ergodicity properties.
mathematics, applied
What problem does this paper attempt to address?