A One-Sample Decentralized Proximal Algorithm for Non-Convex Stochastic Composite Optimization

Tesi Xiao,Xuxing Chen,Krishnakumar Balasubramanian,Saeed Ghadimi
2023-06-23
Abstract:We focus on decentralized stochastic non-convex optimization, where $n$ agents work together to optimize a composite objective function which is a sum of a smooth term and a non-smooth convex term. To solve this problem, we propose two single-time scale algorithms: Prox-DASA and Prox-DASA-GT. These algorithms can find $\epsilon$-stationary points in $\mathcal{O}(n^{-1}\epsilon^{-2})$ iterations using constant batch sizes (i.e., $\mathcal{O}(1)$). Unlike prior work, our algorithms achieve comparable complexity without requiring large batch sizes, more complex per-iteration operations (such as double loops), or stronger assumptions. Our theoretical findings are supported by extensive numerical experiments, which demonstrate the superiority of our algorithms over previous approaches. Our code is available at <a class="link-external link-https" href="https://github.com/xuxingc/ProxDASA" rel="external noopener nofollow">this https URL</a>.
Optimization and Control,Distributed, Parallel, and Cluster Computing,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to find an effective method in decentralized non - convex stochastic composite optimization problems, which can achieve efficient optimization without relying on large - data batches, without the need for complex iterative operations (such as double - loop) or stronger assumptions. Specifically, the paper focuses on the situation in a decentralized environment where multiple agents collaborate to optimize a composite objective function \(\Phi(x)=F(x)+\Psi(x)\), where \(F(x)\) is the smooth term and \(\Psi(x)\) is the non - smooth convex term. ### Problem Background In decentralized stochastic non - convex optimization, existing methods usually require large data batches to ensure convergence or rely on complex variance - reduction techniques, which increase the complexity and computational cost of the algorithm. In addition, many existing methods also require stronger assumption conditions, such as the mean - square smoothness assumption, which limits their scope of application. ### Core Contributions of the Paper To solve the above problems, the authors propose two single - time - scale decentralized proximal stochastic algorithms: Prox - DASA and Prox - DASA - GT. The main features of these algorithms are as follows: 1. **Convergence**: Prox - DASA can achieve convergence in both homogeneous and bounded heterogeneous environments, while Prox - DASA - GT is suitable for more general decentralized heterogeneous problems. 2. **Sample Complexity**: These two algorithms can find \(\epsilon\)-stationary points in \(O(n^{- 1}\epsilon^{-2})\) iterations, each agent only uses a constant number of stochastic gradient samples, and conducts \(m\) rounds of communication in each iteration. By setting \(m = \lceil\frac{1}{\sqrt{1-\rho}}\rceil\), a topology - independent transient time can be achieved. 3. **Experimental Verification**: Through extensive experiments, the authors show that their algorithms are superior to previous works, especially in practical applications. ### Formula Summary - Composite objective function: \(\Phi(x)=F(x)+\Psi(x)\), where \(F(x)=\frac{1}{n}\sum_{i = 1}^{n}F_i(x)\). - Stochastic gradient: \(F_i(x)=\mathbb{E}_{\xi_i\sim D_i}[G_i(x,\xi_i)]\). - Definition of stationary point: \(\mathbb{E}[\|G(\bar{x},\nabla F(\bar{x}),\gamma)\|^2]\leq\epsilon\), where \(G(x,z,\gamma)=\frac{1}{\gamma}(x - \text{prox}_{\gamma\Psi}(x-\gamma z))\). - Sample complexity: Under certain assumption conditions, the algorithm can find \(\epsilon\)-stationary points within the sample complexity of \(O(n^{-1}\epsilon^{-2})\). Through these improvements, the Prox - DASA and Prox - DASA - GT algorithms not only have superior performance theoretically, but also perform well in practical applications, especially when dealing with large - scale distributed optimization problems.