Accelerated analysis on the triple momentum method for a two-layer ReLU neural network

Xin Li,Xin Liu
DOI: https://doi.org/10.1016/j.jksuci.2024.102016
IF: 9.006
2024-03-30
Journal of King Saud University - Computer and Information Sciences
Abstract:The momentum method has become the workhorse in the deep learning community. To theoretical understand its success, researchers put efforts in demystifying its convergence properties when optimizing neural networks. For the convex problem, it is well-known that the triple momentum (TM) method owns the fastest theoretical convergence rate among all first-order methods. However, there exists no theoretical convergence results about the TM method in solving the non-convex neural networks training problem, let alone its acceleration guarantee. In this paper, we focus on the training process of the TM method for a two-layer ReLU neural network. Inspired by the accurate characterization of the high-resolution dynamical system, we consider the high-resolution ordinary differential equation (ODE) of the TM method. Under the over-parameterized assumption, we derive that the original non-convex optimization problem can be transformed to a strongly convex task. By applying an appropriate Lyapunov function, we prove that the TM method can linearly converge to a global minimum. Compared to the heavy ball method and Nesterov's accelerated gradient method, our result provides the first guarantee for the acceleration of the TM method in training neural networks. Through empirical experiments, the accelerated convergence of the TM method and the effect of over-parameterization are validated.
computer science, information systems
What problem does this paper attempt to address?