Distributed Normal Map-based Stochastic Proximal Gradient Methods over Networks

Kun Huang,Shi Pu,Angelia Nedić
2024-12-18
Abstract:Consider $n$ agents connected over a network collaborate to minimize the average of their local cost functions combined with a common nonsmooth function. This paper introduces a unified algorithmic framework for solving such a problem through distributed stochastic proximal gradient methods, leveraging the normal map update scheme. Within this framework, we propose two new algorithms, termed Normal Map-based Distributed Stochastic Gradient Tracking (norM-DSGT) and Normal Map-based Exact Diffusion (norM-ED), to solve the distributed composite optimization problem over a connected network. We demonstrate that both methods can asymptotically achieve comparable convergence rates to the centralized stochastic proximal gradient descent method under a general variance condition on the stochastic gradients. Additionally, the number of iterations required for norM-ED to achieve such a rate (i.e., the transient time) behaves as $\mathcal{O}(n^{3}/(1-\lambda)^2)$ for minimizing composite objective functions, matching the performance of the non-proximal ED algorithm. Here $1-\lambda$ denotes the spectral gap of the mixing matrix related to the underlying network topology. To our knowledge, such a convergence result is state-of-the-art for the considered composite problem. Under the same condition, norM-DSGT enjoys a transient time of $\mathcal{O}(\max\{n^3/(1-\lambda)^2, n/(1-\lambda)^4\})$ and behaves more stable than norM-ED under decaying stepsizes for solving the tested problems.
Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how multiple agents can cooperate to minimize the sum of the average of their local cost functions and a common non - smooth function in a distributed network environment. Specifically, the paper aims to propose a unified algorithm framework to solve this problem through the distributed stochastic proximal gradient method and introduce two new algorithms: normal - mapping - based distributed stochastic gradient tracking (norM - DSGT) and normal - mapping - based exact diffusion (norM - ED). The goals of these methods are to improve the convergence speed of distributed composite optimization problems, especially in sparse networks. ### Specific Problem Description Consider a set of agents \( N=\{1, 2,\ldots, n\} \) connected in a network, which cooperate to solve the following distributed composite optimization problem: \[ \min_{x\in\mathbb{R}^p}\psi(x): = f(x)+\varphi(x),\quad f(x):=\frac{1}{n}\sum_{i = 1}^n f_i(x), \] where each agent \( i \) can only access its local objective function \( f_i:\mathbb{R}^p\rightarrow\mathbb{R} \) and a possible non - smooth function \( \varphi:\mathbb{R}^p\rightarrow(-\infty,\infty] \). The function \( \varphi \) can capture regularization terms, constraints, and penalty terms. For example, when \( \varphi \) is the indicator function of the feasible set, problem (1) becomes a distributed constrained optimization problem. ### Main Challenges 1. **Use of Stochastic Gradients**: Each agent can only query noisy or stochastic gradients \( g_i(x;\xi_i) \) instead of the full gradient \( \nabla f_i(x) \). 2. **Impact of Network Topology**: The sparsity of the network (such as a ring graph) will lead to a significant decrease in the convergence speed. 3. **Limitations of Existing Methods**: Most existing distributed stochastic proximal gradient methods have a slow convergence speed when the number of agents increases and cannot match the performance of the centralized stochastic proximal gradient method. ### Contributions of the Paper 1. **Introduction of a New Algorithm Framework**: Propose the normal - mapping - based distributed stochastic proximal gradient method (norM - SABC - 2), and develop two new algorithms, norM - DSGT and norM - ED, within this framework. 2. **Improvement of Convergence Time**: Under smooth and possibly non - convex \( f \) and weakly convex \( \varphi \), the norM - DSGT and norM - ED methods reduce the convergence time from \( O(n^3/(1 - \lambda)^4) \) to \( O(\max\{n^3/(1 - \lambda)^2,n/(1 - \lambda)^4\}) \) and \( O(n^3/(1 - \lambda)^2) \) respectively. 3. **Relaxation of Assumptions**: Compared with previous work, this paper only assumes that \( \varphi \) is weakly convex and satisfies the ABC condition for stochastic gradients, without the need for additional data heterogeneity conditions. Through these improvements, the paper shows that in distributed composite optimization problems, the new algorithm framework and methods can significantly improve the convergence speed and stability, especially in large - scale sparse networks.