Abstract:We develop a distributed algorithm for convex Empirical Risk Minimization, the problem of minimizing large but finite sum of convex functions over networks. The proposed algorithm is derived from directly discretizing the second-order heavy-ball differential equation and results in an accelerated convergence rate, i.e, faster than distributed gradient descent-based methods for strongly convex objectives that may not be smooth. Notably, we achieve acceleration without resorting to the well-known Nesterov's momentum approach. We provide numerical experiments and contrast the proposed method with recently proposed optimal distributed optimization algorithms.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to achieve accelerated convergence in distributed optimization. Specifically, the paper focuses on the problem of minimizing the sum of a large but finite convex function in a network composed of multiple nodes. This problem is usually known as Empirical Risk Minimization (ERM). Existing methods such as distributed gradient descent are effective, but the convergence speed is slow in the case of strongly convex objective functions. This paper proposes a new algorithm that achieves a faster convergence rate, that is, accelerated convergence, by directly discretizing the Heavy - Ball ODE (Ordinary Differential Equation). This acceleration is achieved without relying on the Nesterov momentum method, thus providing a new acceleration mechanism. ### Main contributions of the paper 1. **Propose a new algorithm**: Based on the method of directly discretizing the second - order Heavy - Ball ODE, a new distributed optimization algorithm is designed. 2. **Accelerate convergence**: It is proved that the algorithm has a faster convergence rate under strongly convex objective functions, that is, \( O(N^{-\frac{2s}{s + 1}}) \), where \( s \) is the order of the integrator. 3. **Theoretical analysis**: Provide detailed theoretical analysis, including the convergence properties and performance guarantees of the algorithm. 4. **Numerical experiments**: Verify the effectiveness of the new algorithm through numerical experiments and compare it with the existing optimal distributed optimization algorithms. ### Key techniques and methods - **Heavy - Ball ODE (Ordinary Differential Equation)**: This is a continuous - time model that describes the accelerated gradient method. - **Runge - Kutta integrator**: Used to discretize the second - order Heavy - Ball ODE to generate a stable and efficient optimization algorithm. - **Distributed implementation**: Through communication between nodes and local information exchange, the distributed computing of the algorithm is realized. ### Mathematical formulas - **Objective function**: \[ \min_{x\in\mathbb{R}^p} f(x)=\sum_{i = 1}^n f_i(x) \] where \( f_i \) is the local convex function of each node. - **Dual problem**: \[ \min_{y\in\mathbb{R}^{np}}\phi(y) \] where the dual function \( \phi(y) \) is defined as: \[ \phi(y)=\max_{x\in\mathbb{R}^{np}}\left\{\langle y,\sqrt{L}x\rangle - F(x)\right\} \] - **Update rules**: \[ g_i=\zeta_0+h\sum_{j = 1}^{i - 1}a_{ij}G(g_j) \] \[ \Phi_h(\zeta_0)=\zeta_0+h\sum_{i = 1}^S b_i G(g_i) \] - **Convergence rate**: \[ \|\sqrt{L}x_N\|^2\leq O\left(\frac{\lambda_{\max}(L)^3}{\mu^3 S N^{-\frac{2s}{s + 1}}}\right) \] \[ \frac{1}{n}\|x_N - x^*\|^2\leq O\left(\frac{\lambda_{\max}(L)^2}{n\mu^3 S N^{-\frac{2s}{s + 1}}}\right) \] \[ \frac{1}{n}[F(x_N)-F(x^*)]\leq O\left(\frac{\sqrt{S}\lambda_{\max}(L)^3}{n^2\mu^3\lambda_{\min}^+(L)M N^{-\frac{s}{s + 1}}}\right) \]

Achieving Acceleration in Distributed Optimization via Direct Discretization of the Heavy-Ball ODE

Accelerated Primal-Dual Algorithms for Distributed Smooth Convex Optimization over Networks

An Accelerated Distributed Method with Inexact Model of Relative Smoothness and Strong Convexity

A Distributed Stochastic Optimization Algorithm with Gradient-Tracking and Distributed Heavy-Ball Acceleration

An Accelerated Algorithm for Distributed Optimization with Barzilai-Borwein Step Sizes

Distributed Accelerated Optimization Algorithms:Insights from an ODE

Accelerated Distributed Aggregative Optimization

Distributed Algorithms for Composite Optimization: Unified Framework and Convergence Analysis

Convergence of an Accelerated Distributed Optimisation Algorithm over Time‐varying Directed Networks

Distributed Learning with Convex SUM-of -Non-convex Objective

An Accelerated Exact Distributed First-Order Algorithm for Optimization over Directed Networks.

Distributed Stochastic Consensus Optimization With Momentum for Nonconvex Nonsmooth Problems

Accelerating Distributed Optimization via Fixed-time Convergent Flows: Extensions to Non-convex Functions and Consistent Discretization

Accelerated Alternating Direction Method of Multipliers Gradient Tracking for Distributed Optimization

Accelerated Convergence Algorithm for Distributed Constrained Optimization under Time-Varying General Directed Graphs.

Accelerated Dynamical Approaches for a Class of Distributed Optimization with Set Constraints

Distributed Discrete-Time Convex Optimization with Closed Convex Set Constraints: Linearly Convergent Algorithm Design.

Accelerated Distributed Optimization over Directed Graphs with Row and Column-Stochastic Matrices.

Convergence of Distributed Accelerated Algorithm over Unbalanced Directed Networks

On Accelerating Distributed Convex Optimizations

Random Gradient Extrapolation for Distributed and Stochastic Optimization