Abstract:Consider the consensus problem of minimizing f(x) = Sigma(n)(i=1) fi(x), where x is an element of R-p and each f(i) is only known to the individual agent i in a connected network of n agents. To solve this problem and obtain the solution, all the agents collaborate with their neighbors through information exchange. This type of decentralized computation does not need a fusion center, offers better network load balance, and improves data privacy. This paper studies the decentralized gradient descent method [A. Nedic and A. Ozdaglar, IEEE Trans. Automat. Control, 54 (2009), pp. 48-61], in which each agent i updates its local variable x((i)) is an element of R-n by combining the average of its neighbors' with a local negative-gradient step -alpha del f(i)(x((i))). The method is described by the iteration x((i)) (k + 1) <- Sigma(n)(j=1) w(ij)x((j))(k)-alpha del f(i)(x((i))(k)), for each agent i, where w(ij) is nonzero only if i and j are neighbors or i = j and the matrix W = [w(ij)] is an element of R-n (x) (n) is symmetric and doubly stochastic. This paper analyzes the convergence of this iteration and derives its rate of convergence under the assumption that each f(i) is proper closed convex and lower bounded, del f(i) is Lipschitz continuous with constant L-fi > 0, and the stepsize a is fixed. Provided that alpha <= min{(1 + lambda(n)) (W))/L-h, 1/L-(f) over bar}, where L-h = max(i){L-fi} and L-(f) over bar = (1)(n) Sigma(n)(i=1) L-fi, the objective errors of all the local solutions and the networkwide mean solution reduce at rates of O(1/k) until they reach a level of O(alpha). If fi are strongly convex with modulus mu(fi), and alpha <= min{(1+lambda(n)(W))/L-h, 1/(L-f+mu(f))}, where mu(f) - 1/n Sigma(n)(i=1) mu(fi), then all the local solutions and the mean solution converge to the global minimizer x* at a linear rate until reaching an O(alpha)-neighborhood of x*. We also develop an iteration for decentralized basis pursuit and establish its linear convergence to an O(alpha)-neighborhood of the true sparse signal. This analysis reveals how the convergence of x((i))(k + 1) <- Sigma(n)(j=1) wijx(j)(k) - alpha del f(i)(x((i))(k)), for each agent i, depends on the stepsize, function convexity, and network spectrum.

Gradient descent with nonconvex constraints: local concavity determines convergence

Convergence of Projected Subgradient Method with Sparse or Low-Rank Constraints

Convergence Analysis of Gradient Algorithms on Riemannian Manifolds Without Curvature Constraints and Application to Riemannian Mass

Convergence of Inexact Steepest Descent Algorithm for Multiobjective Optimizations on Riemannian Manifolds Without Curvature Constraints

Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees

No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis

Convergence analysis of the Gauss–Newton method for convex inclusion and convex-composite optimization problems

On Linear Convergence of Non-Euclidean Gradient Methods Without Strong Convexity and Lipschitz Gradient Continuity.

Convergence Properties of Nonlinear Conjugate Gradient Methods.

Convex and Non-convex Optimization Under Generalized Smoothness

Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point

Concavifiability and convergence: necessary and sufficient conditions for gradient descent analysis

Inexact Riemannian Gradient Descent Method for Nonconvex Optimization

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

Nonlinear Conjugate Gradient Methods for Optimization of Set-Valued Mappings of Finite Cardinality

Gradient methods for convex minimization: better rates under weaker conditions

High Probability Convergence Bounds for Non-convex Stochastic Gradient Descent with Sub-Weibull Noise

Accelerated Gradient Method for A Class of Nonconvex Low Rank Problem: Essentially Matching the Optimal Convex Convergence Rate

On the Convergence of Decentralized Gradient Descent.

Gauss-Southwell type descent methods for low-rank matrix optimization

Projected Gradient Descent Algorithm for Low-Rank Matrix Estimation