Abstract:It is well known that adding any skew symmetric matrix to the gradient of Langevin dynamics algorithm results in a non-reversible diffusion with improved convergence rate. This paper presents a gradient algorithm to adaptively optimize the choice of the skew symmetric matrix. The resulting algorithm involves a non-reversible diffusion algorithm cross coupled with a stochastic gradient algorithm that adapts the skew symmetric matrix. The algorithm uses the same data as the classical Langevin algorithm. A weak convergence proof is given for the optimality of the choice of the skew symmetric matrix. The improved convergence rate of the algorithm is illustrated numerically in Bayesian learning and tracking examples.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is: how to select the skew - symmetric matrix through adaptive optimization to accelerate the convergence rate of the non - reversible diffusion process, thereby improving the performance of the Langevin dynamics algorithm in Bayesian learning and global stochastic optimization. ### Problem Background Langevin dynamics is a method for global stochastic optimization and can be used as a non - parametric method to reconstruct (explore) the cost function (such as posterior density) from the noisy evaluation of the gradient. However, the traditional Langevin dynamics is a reversible diffusion process, and its convergence to the stationary distribution may be slow. Previous studies have shown that adding an arbitrary skew - symmetric matrix to the gradient can improve the convergence rate and form a non - reversible diffusion process. ### Main Contributions of the Paper 1. **Adaptive Optimization of Skew - Symmetric Matrix**: - The paper proposes an adaptive algorithm. By adjusting the skew - symmetric matrix \( S \) in real - time, the convergence rate of the non - reversible diffusion process is further improved. - Specifically, this algorithm combines a non - reversible diffusion process with another stochastic gradient algorithm for updating \( S \), forming a cross - coupled structure. 2. **Three Specific Adaptive Algorithms**: - **Hessian - Based Algorithm**: Update \( S \) by calculating the Hessian matrix, but the computational complexity is relatively high. - **SPSA (Simultaneous Perturbation Stochastic Approximation) Algorithm**: Estimate the gradient by the finite - difference method, which is more computationally efficient. - **Two - Time - Scale SPSA Algorithm**: Estimate the gradient on a fast time - scale and update \( S \) on a slow time - scale, which is suitable for more complex scenarios. 3. **Non - Stationary Global Optimization and Tracking Analysis**: - The paper also studies how the proposed algorithms track these changes in a non - stationary environment, that is, when the global optimal solution changes over time. - Use Markov chains to model the changes of the optimal solution and analyze the consistency of the algorithms in this case. ### Numerical Experiment Results Through numerical experiments, the paper shows the superior performance of these three adaptive algorithms in Bayesian learning and KL - divergence estimation. Especially in high - dimensional problems, they converge faster than the traditional Langevin dynamics and the accelerated non - reversible diffusion algorithms. ### Conclusion By introducing the method of adaptive optimization of the skew - symmetric matrix, this paper significantly improves the convergence rate and performance of the Langevin dynamics algorithm, which is suitable for application scenarios such as Bayesian learning and global stochastic optimization.

Adaptive Non-reversible Stochastic Gradient Langevin Dynamics

Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte Carlo

Langevin Dynamics: A Unified Perspective on Optimization via Lyapunov Potentials

Non-asymptotic Analysis of Biased Adaptive Stochastic Approximation

Adaptive Stochastic Gradient Langevin Dynamics: Taming Convergence and Saddle Point Escape Time

Kalman Gradient Descent: Adaptive Variance Reduction in Stochastic Optimization

Convergence Analysis of Asynchronous Stochastic Recursive Gradient Algorithms

Scalable Gradients for Stochastic Differential Equations

Novel Convergence Results of Adaptive Stochastic Gradient Descents

Multi-kernel Passive Stochastic Gradient Algorithms and Transfer Learning

Exact Langevin Dynamics with Stochastic Gradients

Generalized EXTRA stochastic gradient Langevin dynamics

Langevin algorithms for Markovian Neural Networks and Deep Stochastic control

Scalable Stochastic Gradient Riemannian Langevin Dynamics in Non-Diagonal Metrics

Convergence Analysis of Adaptive Gradient Methods under Refined Smoothness and Noise Assumptions

Non asymptotic analysis of Adaptive stochastic gradient algorithms and applications

Stochastic Approximate Gradient Descent via the Langevin Algorithm

Sampling as optimization in the space of measures: The Langevin dynamics as a composite optimization problem

Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent

Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization

Stochastic Gradient Descent as Approximate Bayesian Inference