Abstract:In this paper, we study a variant of the quadratic penalty method for linearly constrained convex problems, which has already been widely used but actually lacks theoretical justification. Namely, the penalty parameter steadily increases and the penalized objective function is minimized inexactly rather than exactly, e.g., with only one step of the proximal gradient descent. For such a variant of the quadratic penalty method, we give counterexamples to show that it may not give a solution to the original constrained problem. By choosing special penalty parameters, we ensure the convergence and further establish the convergence rates of $O\left(\frac{1}{\sqrt{K}}\right)$ for the generally convex problems and $O\left(\frac{1}{K}\right)$ for strongly convex ones, where $K$ is the number of iterations. Furthermore, by adopting Nesterov's extrapolation we show that the convergence rates can be improved to $O\left(\frac{1}{K}\right)$ for the generally convex problems and $O\left(\frac{1}{K^2}\right)$ for strongly convex ones. When applied to the decentralized distributed optimization, the penalty methods studied in this paper become the widely used distributed gradient method and the fast distributed gradient method. However, due to the totally different analysis framework, we can improve their $O\left(\frac{\log K}{\sqrt{K}}\right)$ and $O\left(\frac{\log K}{K}\right)$ convergence rates to $O\left(\frac{1}{\sqrt{K}}\right)$ and $O\left(\frac{1}{K}\right)$ with fewer assumptions on the network topology for general convex problems. Using our analysis framework, we also extend the fast distributed gradient method to a communication efficient version, i.e., finding an $\varepsilon$ solution in $O\left(\frac{1}{\varepsilon}\right)$ communications and $O\left(\frac{1}{\varepsilon^{2+\delta}}\right)$ computations for the non-smooth problems, where $\delta$ is a small constant.

Convergence Rates Analysis of The Quadratic Penalty Method and Its Applications to Decentralized Distributed Optimization

Decentralized Accelerated Gradient Methods with Increasing Penalty Parameters.

Convergence Rate of a Penalty Method for Strongly Convex Problems with Linear Constraints

Convergence in High Probability of Distributed Stochastic Gradient Descent Algorithms

Distributed Algorithms for Composite Optimization: Unified Framework and Convergence Analysis

A Decentralized Proximal Point-type Method for Non-convex Non-concave Saddle Point Problems

Distributed Zeroth-Order Optimization: Convergence Rates That Match Centralized Counterpart

A Communication-Efficient Decentralized Newton's Method with Provably Faster Convergence

On Convergence Rates of Linearized Proximal Algorithms for Convex Composite Optimization with Applications.

Huber Loss-Based Penalty Approach to Problems with Linear Constraints

On the Divergence of Decentralized Non-Convex Optimization

Can Decentralized Stochastic Minimax Optimization Algorithms Converge Linearly for Finite-Sum Nonconvex-Nonconcave Problems?

On the convergence analysis of the decentralized projected gradient descent method

Distributed adaptive Newton methods with global superlinear convergence

On Convergence of Distributed Approximate Newton Methods: Globalization, Sharper Bounds and Beyond

Distributed Optimization Algorithm with Superlinear Convergence Rate

Double Averaging and Gradient Projection: Convergence Guarantees for Decentralized Constrained Optimization

A class of smooth exact penalty function methods for optimization problems with orthogonality constraints

Decentralized non-convex optimization via bi-level SQP and ADMM

Global Convergence Analysis of the Power Proximal Point and Augmented Lagrangian Method

Computational Convergence Analysis of Distributed Gradient Tracking for Smooth Convex Optimization Using Dissipativity Theory