Adaptive Non-reversible Stochastic Gradient Langevin Dynamics

Vikram Krishnamurthy,George Yin
DOI: https://doi.org/10.48550/arXiv.2009.12690
2020-09-27
Abstract:It is well known that adding any skew symmetric matrix to the gradient of Langevin dynamics algorithm results in a non-reversible diffusion with improved convergence rate. This paper presents a gradient algorithm to adaptively optimize the choice of the skew symmetric matrix. The resulting algorithm involves a non-reversible diffusion algorithm cross coupled with a stochastic gradient algorithm that adapts the skew symmetric matrix. The algorithm uses the same data as the classical Langevin algorithm. A weak convergence proof is given for the optimality of the choice of the skew symmetric matrix. The improved convergence rate of the algorithm is illustrated numerically in Bayesian learning and tracking examples.
Machine Learning,Systems and Control
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: how to select the skew - symmetric matrix through adaptive optimization to accelerate the convergence rate of the non - reversible diffusion process, thereby improving the performance of the Langevin dynamics algorithm in Bayesian learning and global stochastic optimization. ### Problem Background Langevin dynamics is a method for global stochastic optimization and can be used as a non - parametric method to reconstruct (explore) the cost function (such as posterior density) from the noisy evaluation of the gradient. However, the traditional Langevin dynamics is a reversible diffusion process, and its convergence to the stationary distribution may be slow. Previous studies have shown that adding an arbitrary skew - symmetric matrix to the gradient can improve the convergence rate and form a non - reversible diffusion process. ### Main Contributions of the Paper 1. **Adaptive Optimization of Skew - Symmetric Matrix**: - The paper proposes an adaptive algorithm. By adjusting the skew - symmetric matrix \( S \) in real - time, the convergence rate of the non - reversible diffusion process is further improved. - Specifically, this algorithm combines a non - reversible diffusion process with another stochastic gradient algorithm for updating \( S \), forming a cross - coupled structure. 2. **Three Specific Adaptive Algorithms**: - **Hessian - Based Algorithm**: Update \( S \) by calculating the Hessian matrix, but the computational complexity is relatively high. - **SPSA (Simultaneous Perturbation Stochastic Approximation) Algorithm**: Estimate the gradient by the finite - difference method, which is more computationally efficient. - **Two - Time - Scale SPSA Algorithm**: Estimate the gradient on a fast time - scale and update \( S \) on a slow time - scale, which is suitable for more complex scenarios. 3. **Non - Stationary Global Optimization and Tracking Analysis**: - The paper also studies how the proposed algorithms track these changes in a non - stationary environment, that is, when the global optimal solution changes over time. - Use Markov chains to model the changes of the optimal solution and analyze the consistency of the algorithms in this case. ### Numerical Experiment Results Through numerical experiments, the paper shows the superior performance of these three adaptive algorithms in Bayesian learning and KL - divergence estimation. Especially in high - dimensional problems, they converge faster than the traditional Langevin dynamics and the accelerated non - reversible diffusion algorithms. ### Conclusion By introducing the method of adaptive optimization of the skew - symmetric matrix, this paper significantly improves the convergence rate and performance of the Langevin dynamics algorithm, which is suitable for application scenarios such as Bayesian learning and global stochastic optimization.