Abstract:Consider $n$ agents connected over a network collaborate to minimize the average of their local cost functions combined with a common nonsmooth function. This paper introduces a unified algorithmic framework for solving such a problem through distributed stochastic proximal gradient methods, leveraging the normal map update scheme. Within this framework, we propose two new algorithms, termed Normal Map-based Distributed Stochastic Gradient Tracking (norM-DSGT) and Normal Map-based Exact Diffusion (norM-ED), to solve the distributed composite optimization problem over a connected network. We demonstrate that both methods can asymptotically achieve comparable convergence rates to the centralized stochastic proximal gradient descent method under a general variance condition on the stochastic gradients. Additionally, the number of iterations required for norM-ED to achieve such a rate (i.e., the transient time) behaves as $\mathcal{O}(n^{3}/(1-\lambda)^2)$ for minimizing composite objective functions, matching the performance of the non-proximal ED algorithm. Here $1-\lambda$ denotes the spectral gap of the mixing matrix related to the underlying network topology. To our knowledge, such a convergence result is state-of-the-art for the considered composite problem. Under the same condition, norM-DSGT enjoys a transient time of $\mathcal{O}(\max\{n^3/(1-\lambda)^2, n/(1-\lambda)^4\})$ and behaves more stable than norM-ED under decaying stepsizes for solving the tested problems.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how multiple agents can cooperate to minimize the sum of the average of their local cost functions and a common non - smooth function in a distributed network environment. Specifically, the paper aims to propose a unified algorithm framework to solve this problem through the distributed stochastic proximal gradient method and introduce two new algorithms: normal - mapping - based distributed stochastic gradient tracking (norM - DSGT) and normal - mapping - based exact diffusion (norM - ED). The goals of these methods are to improve the convergence speed of distributed composite optimization problems, especially in sparse networks. ### Specific Problem Description Consider a set of agents $ N=\{1, 2,\ldots, n\} $ connected in a network, which cooperate to solve the following distributed composite optimization problem: \[ \min_{x\in\mathbb{R}^p}\psi(x): = f(x)+\varphi(x),\quad f(x):=\frac{1}{n}\sum_{i = 1}^n f_i(x), \] where each agent $ i $ can only access its local objective function $ f_i:\mathbb{R}^p\rightarrow\mathbb{R} $ and a possible non - smooth function $ \varphi:\mathbb{R}^p\rightarrow(-\infty,\infty] $. The function $ \varphi $ can capture regularization terms, constraints, and penalty terms. For example, when $ \varphi $ is the indicator function of the feasible set, problem (1) becomes a distributed constrained optimization problem. ### Main Challenges 1. **Use of Stochastic Gradients**: Each agent can only query noisy or stochastic gradients $ g_i(x;\xi_i) $ instead of the full gradient $ \nabla f_i(x) $. 2. **Impact of Network Topology**: The sparsity of the network (such as a ring graph) will lead to a significant decrease in the convergence speed. 3. **Limitations of Existing Methods**: Most existing distributed stochastic proximal gradient methods have a slow convergence speed when the number of agents increases and cannot match the performance of the centralized stochastic proximal gradient method. ### Contributions of the Paper 1. **Introduction of a New Algorithm Framework**: Propose the normal - mapping - based distributed stochastic proximal gradient method (norM - SABC - 2), and develop two new algorithms, norM - DSGT and norM - ED, within this framework. 2. **Improvement of Convergence Time**: Under smooth and possibly non - convex $ f $ and weakly convex $ \varphi $, the norM - DSGT and norM - ED methods reduce the convergence time from $ O(n^3/(1 - \lambda)^4) $ to $ O(\max\{n^3/(1 - \lambda)^2,n/(1 - \lambda)^4\}) $ and $ O(n^3/(1 - \lambda)^2) $ respectively. 3. **Relaxation of Assumptions**: Compared with previous work, this paper only assumes that $ \varphi $ is weakly convex and satisfies the ABC condition for stochastic gradients, without the need for additional data heterogeneity conditions. Through these improvements, the paper shows that in distributed composite optimization problems, the new algorithm framework and methods can significantly improve the convergence speed and stability, especially in large - scale sparse networks.

Distributed Normal Map-based Stochastic Proximal Gradient Methods over Networks

An Accelerated Distributed Stochastic Gradient Method with Momentum

Accelerated Primal-Dual Algorithms for Distributed Smooth Convex Optimization over Networks

A Push-Pull Gradient Method for Distributed Optimization in Networks.

Stochastic Gradient Tracking Methods for Distributed Personalized Optimization over Networks

Distributed Adaptive Gradient Algorithm with Gradient Tracking for Stochastic Non-Convex Optimization

Improving the Transient Times for Distributed Stochastic Gradient Methods

Convergence in High Probability of Distributed Stochastic Gradient Descent Algorithms

Augmented Distributed Gradient Methods for Multi-Agent Optimization under Uncoordinated Constant Stepsizes

Distributed Stochastic Algorithm for Global Optimization in Networked System

Distributed Stochastic Consensus Optimization With Momentum for Nonconvex Nonsmooth Problems

Distributed Algorithms for Composite Optimization: Unified Framework and Convergence Analysis

Convergence of a Normal Map-based Prox-SGD Method under the KL Inequality

A Unified Contraction Analysis of a Class of Distributed Algorithms for Composite Optimization

Decentralized stochastic subgradient projection optimization algorithms over random networks

A Communication-Efficient Stochastic Gradient Descent Algorithm for Distributed Nonconvex Optimization

Distributed Optimization Based on Gradient-tracking Revisited: Enhancing Convergence Rate via Surrogation

Distributed Stochastic Optimization with Gradient Tracking over Time-Varying Directed Networks

Decentralized Stochastic Subgradient Methods for Nonsmooth Nonconvex Optimization

Provably Accelerated Decentralized Gradient Method Over Unbalanced Directed Graphs