Abstract:We focus on decentralized stochastic non-convex optimization, where $n$ agents work together to optimize a composite objective function which is a sum of a smooth term and a non-smooth convex term. To solve this problem, we propose two single-time scale algorithms: Prox-DASA and Prox-DASA-GT. These algorithms can find $\epsilon$-stationary points in $\mathcal{O}(n^{-1}\epsilon^{-2})$ iterations using constant batch sizes (i.e., $\mathcal{O}(1)$). Unlike prior work, our algorithms achieve comparable complexity without requiring large batch sizes, more complex per-iteration operations (such as double loops), or stronger assumptions. Our theoretical findings are supported by extensive numerical experiments, which demonstrate the superiority of our algorithms over previous approaches. Our code is available at <a class="link-external link-https" href="https://github.com/xuxingc/ProxDASA" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to find an effective method in decentralized non - convex stochastic composite optimization problems, which can achieve efficient optimization without relying on large - data batches, without the need for complex iterative operations (such as double - loop) or stronger assumptions. Specifically, the paper focuses on the situation in a decentralized environment where multiple agents collaborate to optimize a composite objective function $\Phi(x)=F(x)+\Psi(x)$, where $F(x)$ is the smooth term and $\Psi(x)$ is the non - smooth convex term. ### Problem Background In decentralized stochastic non - convex optimization, existing methods usually require large data batches to ensure convergence or rely on complex variance - reduction techniques, which increase the complexity and computational cost of the algorithm. In addition, many existing methods also require stronger assumption conditions, such as the mean - square smoothness assumption, which limits their scope of application. ### Core Contributions of the Paper To solve the above problems, the authors propose two single - time - scale decentralized proximal stochastic algorithms: Prox - DASA and Prox - DASA - GT. The main features of these algorithms are as follows: 1. **Convergence**: Prox - DASA can achieve convergence in both homogeneous and bounded heterogeneous environments, while Prox - DASA - GT is suitable for more general decentralized heterogeneous problems. 2. **Sample Complexity**: These two algorithms can find $\epsilon$-stationary points in $O(n^{- 1}\epsilon^{-2})$ iterations, each agent only uses a constant number of stochastic gradient samples, and conducts $m$ rounds of communication in each iteration. By setting $m = \lceil\frac{1}{\sqrt{1-\rho}}\rceil$, a topology - independent transient time can be achieved. 3. **Experimental Verification**: Through extensive experiments, the authors show that their algorithms are superior to previous works, especially in practical applications. ### Formula Summary - Composite objective function: $\Phi(x)=F(x)+\Psi(x)$, where $F(x)=\frac{1}{n}\sum_{i = 1}^{n}F_i(x)$. - Stochastic gradient: $F_i(x)=\mathbb{E}_{\xi_i\sim D_i}[G_i(x,\xi_i)]$. - Definition of stationary point: $\mathbb{E}[\|G(\bar{x},\nabla F(\bar{x}),\gamma)\|^2]\leq\epsilon$, where $G(x,z,\gamma)=\frac{1}{\gamma}(x - \text{prox}_{\gamma\Psi}(x-\gamma z))$. - Sample complexity: Under certain assumption conditions, the algorithm can find $\epsilon$-stationary points within the sample complexity of $O(n^{-1}\epsilon^{-2})$. Through these improvements, the Prox - DASA and Prox - DASA - GT algorithms not only have superior performance theoretically, but also perform well in practical applications, especially when dealing with large - scale distributed optimization problems.

A One-Sample Decentralized Proximal Algorithm for Non-Convex Stochastic Composite Optimization

Decentralized Dual Proximal Gradient Algorithms for Non-Smooth Constrained Composite Optimization Problems.

An Edge-based Stochastic Proximal Gradient Algorithm for Decentralized Composite Optimization

Distributed Algorithms for Composite Optimization: Unified Framework and Convergence Analysis

A Proximal Gradient Algorithm for Decentralized Composite Optimization

A Distributed Stochastic Proximal-Gradient Algorithm for Composite Optimization

A Unified Contraction Analysis of a Class of Distributed Algorithms for Composite Optimization

A Unified Algorithmic Framework for Distributed Composite Optimization.

Asynchronous Decentralized Accelerated Stochastic Gradient Descent

Asynchronous Proximal Stochastic Gradient Algorithm for Composition Optimization Problems

Asynchronous Stochastic Proximal Methods for Nonconvex Nonsmooth Optimization.

A Decentralized Stochastic Algorithm for Coupled Composite Optimization with Linear Convergence

A Computation-Efficient Decentralized Algorithm for Composite Constrained Optimization

An Accelerated Decentralized Stochastic Optimization Algorithm with Inexact Model

Linearized Proximal Algorithms with Adaptive Stepsizes for Convex Composite Optimization with Applications

Decentralized Triple Proximal Splitting Algorithm with Uncoordinated Stepsizes for Nonsmooth Composite Optimization Problems

Asynchronous Distributed Nonsmooth Composite Optimization Via Computation-Efficient Primal-Dual Proximal Algorithms

SignProx: One-bit Proximal Algorithm for Nonconvex Stochastic Optimization

A Proximal Gradient Algorithm for Decentralized Nondifferentiable Optimization.

Prox-DBRO-VR: A Unified Analysis on Decentralized Byzantine-Resilient Composite Stochastic Optimization with Variance Reduction and Non-Asymptotic Convergence Rates

Decentralized Primal-Dual Proximal Operator Algorithm for Constrained Nonsmooth Composite Optimization Problems over Networks.