Abstract:The rise of artificial intelligence (AI) hinges on the efficient training of modern deep neural networks (DNNs) for non-convex optimization and uncertainty quantification, which boils down to a non-convex Bayesian learning problem. A standard tool to handle the problem is Langevin Monte Carlo, which proposes to approximate the posterior distribution with theoretical guarantees. In this thesis, we start with the replica exchange Langevin Monte Carlo (also known as parallel tempering), which proposes appropriate swaps between exploration and exploitation to achieve accelerations. However, the naïve extension of swaps to big data problems leads to a large bias, and bias-corrected swaps are required. Such a mechanism leads to few effective swaps and insignificant accelerations. To alleviate this issue, we first propose a control variates method to reduce the variance of noisy energy estimators and show a potential to accelerate the exponential convergence. We also present the population-chain replica exchange based on non-reversibility and obtain an optimal round-trip rate for deep learning. In the second part of the thesis, we study scalable dynamic importance sampling algorithms based on stochastic approximation. Traditional dynamic importance sampling algorithms have achieved success, however, the lack of scalability has greatly limited their extensions to big data. To handle this scalability issue, we resolve the vanishing gradient problem and propose two dynamic importance sampling algorithms. Theoretically, we establish the stability condition for the underlying ordinary differential equation (ODE) system and guarantee the asymptotic convergence of the latent variable to the desired fixed point. Interestingly, such a result still holds given non-convex energy landscapes.

Langevin Dynamics with Continuous Tempering for High-dimensional Non-convex Optimization.

Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks

CoolMomentum: A Method for Stochastic Optimization by Langevin Dynamics with Simulated Annealing

Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte Carlo

Provable Convergence and Limitations of Geometric Tempering for Langevin Dynamics

Mean-field Langevin System, Optimal Control and Deep Neural Networks

Training Deep Neural Networks by optimizing over nonlocal paths in hyperparameter space

Langevin Dynamics: A Unified Perspective on Optimization via Lyapunov Potentials

Partitioned integrators for thermodynamic parameterization of neural networks

Annealing Optimization for Progressive Learning With Stochastic Approximation

Quantum Langevin Dynamics for Optimization

Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization

Variational Tempering

Langevin algorithms for Markovian Neural Networks and Deep Stochastic control

Charting the Topography of the Neural Network Landscape with Thermal-Like Noise

Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks

Langevin dynamics based algorithm e-TH$\varepsilon$O POULA for stochastic optimization problems with discontinuous stochastic gradient

Optimizing Temperature Distributions for Training Neural Quantum States using Parallel Tempering

Limiting Behaviors of Nonconvex-Nonconcave Minimax Optimization via Continuous-Time Systems

Efficient Adaptive Optimization via Subset-Norm and Subspace-Momentum: Fast, Memory-Reduced Training with Convergence Guarantees