Achieving Acceleration in Distributed Optimization via Direct Discretization of the Heavy-Ball ODE

Jingzhao Zhang,César A. Uribe,Aryan Mokhtari,Ali Jadbabaie
DOI: https://doi.org/10.48550/arXiv.1811.02521
2018-11-07
Abstract:We develop a distributed algorithm for convex Empirical Risk Minimization, the problem of minimizing large but finite sum of convex functions over networks. The proposed algorithm is derived from directly discretizing the second-order heavy-ball differential equation and results in an accelerated convergence rate, i.e, faster than distributed gradient descent-based methods for strongly convex objectives that may not be smooth. Notably, we achieve acceleration without resorting to the well-known Nesterov's momentum approach. We provide numerical experiments and contrast the proposed method with recently proposed optimal distributed optimization algorithms.
Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve accelerated convergence in distributed optimization. Specifically, the paper focuses on the problem of minimizing the sum of a large but finite convex function in a network composed of multiple nodes. This problem is usually known as Empirical Risk Minimization (ERM). Existing methods such as distributed gradient descent are effective, but the convergence speed is slow in the case of strongly convex objective functions. This paper proposes a new algorithm that achieves a faster convergence rate, that is, accelerated convergence, by directly discretizing the Heavy - Ball ODE (Ordinary Differential Equation). This acceleration is achieved without relying on the Nesterov momentum method, thus providing a new acceleration mechanism. ### Main contributions of the paper 1. **Propose a new algorithm**: Based on the method of directly discretizing the second - order Heavy - Ball ODE, a new distributed optimization algorithm is designed. 2. **Accelerate convergence**: It is proved that the algorithm has a faster convergence rate under strongly convex objective functions, that is, \( O(N^{-\frac{2s}{s + 1}}) \), where \( s \) is the order of the integrator. 3. **Theoretical analysis**: Provide detailed theoretical analysis, including the convergence properties and performance guarantees of the algorithm. 4. **Numerical experiments**: Verify the effectiveness of the new algorithm through numerical experiments and compare it with the existing optimal distributed optimization algorithms. ### Key techniques and methods - **Heavy - Ball ODE (Ordinary Differential Equation)**: This is a continuous - time model that describes the accelerated gradient method. - **Runge - Kutta integrator**: Used to discretize the second - order Heavy - Ball ODE to generate a stable and efficient optimization algorithm. - **Distributed implementation**: Through communication between nodes and local information exchange, the distributed computing of the algorithm is realized. ### Mathematical formulas - **Objective function**: \[ \min_{x\in\mathbb{R}^p} f(x)=\sum_{i = 1}^n f_i(x) \] where \( f_i \) is the local convex function of each node. - **Dual problem**: \[ \min_{y\in\mathbb{R}^{np}}\phi(y) \] where the dual function \( \phi(y) \) is defined as: \[ \phi(y)=\max_{x\in\mathbb{R}^{np}}\left\{\langle y,\sqrt{L}x\rangle - F(x)\right\} \] - **Update rules**: \[ g_i=\zeta_0+h\sum_{j = 1}^{i - 1}a_{ij}G(g_j) \] \[ \Phi_h(\zeta_0)=\zeta_0+h\sum_{i = 1}^S b_i G(g_i) \] - **Convergence rate**: \[ \|\sqrt{L}x_N\|^2\leq O\left(\frac{\lambda_{\max}(L)^3}{\mu^3 S N^{-\frac{2s}{s + 1}}}\right) \] \[ \frac{1}{n}\|x_N - x^*\|^2\leq O\left(\frac{\lambda_{\max}(L)^2}{n\mu^3 S N^{-\frac{2s}{s + 1}}}\right) \] \[ \frac{1}{n}[F(x_N)-F(x^*)]\leq O\left(\frac{\sqrt{S}\lambda_{\max}(L)^3}{n^2\mu^3\lambda_{\min}^+(L)M N^{-\frac{s}{s + 1}}}\right) \]