Abstract:This paper considers the distributed convex-concave minimax optimization under the second-order similarity. We propose stochastic variance-reduced optimistic gradient sliding (SVOGS) method, which takes the advantage of the finite-sum structure in the objective by involving the mini-batch client sampling and variance reduction. We prove SVOGS can achieve the $\varepsilon$-duality gap within communication rounds of ${\mathcal O}(\delta D^2/\varepsilon)$, communication complexity of ${\mathcal O}(n+\sqrt{n}\delta D^2/\varepsilon)$, and local gradient calls of $\tilde{\mathcal O}(n+(\sqrt{n}\delta+L)D^2/\varepsilon\log(1/\varepsilon))$, where $n$ is the number of nodes, $\delta$ is the degree of the second-order similarity, $L$ is the smoothness parameter and $D$ is the diameter of the constraint set. We can verify that all of above complexity (nearly) matches the corresponding lower bounds. For the specific $\mu$-strongly-convex-$\mu$-strongly-convex case, our algorithm has the upper bounds on communication rounds, communication complexity, and local gradient calls of $\mathcal O(\delta/\mu\log(1/\varepsilon))$, ${\mathcal O}((n+\sqrt{n}\delta/\mu)\log(1/\varepsilon))$, and $\tilde{\mathcal O}(n+(\sqrt{n}\delta+L)/\mu)\log(1/\varepsilon))$ respectively, which are also nearly tight. Furthermore, we conduct the numerical experiments to show the empirical advantages of proposed method.

What problem does this paper attempt to address?

This paper attempts to solve the distributed convex - concave minimax optimization problem, especially under the condition that the functions have second - order similarity. Specifically, the paper proposes a new method - Stochastic Variance - Reduced Optimistic Gradient Sliding (SVOGS), aiming to reduce the number of communication rounds, communication complexity and local gradient calls by taking advantage of the finite - sum structure of the objective function and mini - batch sampling of clients. The paper proves that SVOGS can achieve approximately optimal results in terms of the number of communication rounds, communication complexity and local gradient calls, and verifies the effectiveness of this method through numerical experiments. ### Background and Problem Description of the Paper - **Research Background**: Distributed optimization is widely used in machine learning, especially when training models on large - scale datasets. However, communication complexity is the main bottleneck in distributed optimization. To improve communication efficiency, many methods utilize the homogeneity of local functions, especially second - order similarity (that is, the difference between the Hessian matrix of each local function and the Hessian matrix of the global objective function is bounded). - **Research Problem**: How to design an efficient distributed optimization algorithm that can solve the convex - concave minimax optimization problem while keeping the communication complexity and computational complexity low? ### Contributions of the Paper - **Proposed New Method**: The SVOGS method combines stochastic variance reduction techniques and the Optimistic Gradient Descent - Ascent (OGDA) method to balance the number of communication rounds, communication complexity and computational complexity through mini - batch client sampling. - **Theoretical Results**: - For general convex - concave minimax problems, SVOGS can achieve an $\epsilon$-duality gap within the number of communication rounds $O\left(\frac{\delta D^2}{\epsilon}\right)$, communication complexity $O\left(n+\sqrt{n\delta D^2 / \epsilon}\right)$ and local gradient calls $\tilde{O}\left(n+(\sqrt{n\delta}+L)\frac{D^2}{\epsilon}\log\frac{1}{\epsilon}\right)$. - For the strongly convex - strongly concave case, the number of communication rounds, communication complexity and local gradient calls of SVOGS are respectively $O\left(\frac{\delta}{\mu}\log\frac{1}{\epsilon}\right)$, $O\left((n + \sqrt{n\delta / \mu})\log\frac{1}{\epsilon}\right)$ and $\tilde{O}\left((n+(\sqrt{n\delta}+L)/\mu)\log\frac{1}{\epsilon}\right)$. - **Experimental Verification**: Through experiments on actual datasets, the superior performance of SVOGS in terms of the number of communication rounds, communication complexity and local gradient calls is verified. ### Structure of the Paper 1. **Introduction**: Introduce the research background, problem definition and the main contributions of the paper. 2. **Preliminaries**: Define symbols and assumptions, including constraint sets, smoothness and convex - concave properties of local functions, etc. 3. **Related Work**: Review existing distributed optimization methods and their complexity analysis. 4. **SVOGS Method**: Describe in detail the design ideas and implementation steps of the SVOGS algorithm. 5. **Complexity Analysis**: Provide theoretical analysis of SVOGS, including upper bounds on the number of communication rounds, communication complexity and local gradient calls. 6. **Optimality Analysis**: Provide the lower - bound complexity of distributed first - order optimization methods and verify the approximate optimality of SVOGS. 7. **Experimental Results**: Verify the effectiveness of SVOGS through numerical experiments.

Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity

Stochastic Distributed Optimization under Average Second-order Similarity: Algorithms and Analysis

Near Optimal Stochastic Algorithms for Finite-Sum Unbalanced Convex-Concave Minimax Optimization

Accelerated Primal-Dual Algorithms for Distributed Smooth Convex Optimization over Networks

A Stochastic Second-Order Proximal Method for Distributed Optimization.

A Communication-efficient Linearly Convergent Algorithm with Variance Reduction for Distributed Stochastic Optimization

Distributed Stochastic Consensus Optimization With Momentum for Nonconvex Nonsmooth Problems

Distributed Optimization Based on Gradient-tracking Revisited: Enhancing Convergence Rate via Surrogation

D-SOP: Distributed Second Order Proximal Method for Convex Composite Optimization.

Distributed Optimization Algorithm with Superlinear Convergence Rate

An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum Optimization

Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex Optimization

Bregman Proximal Method for Efficient Communications under Similarity

Distributed Stochastic Optimization with Compression for Non-Strongly Convex Objectives

Diffusion Stochastic Optimization for Min-Max Problems

Federated Minimax Optimization: Improved Convergence Analyses and Algorithms

An Efficient Stochastic Algorithm for Decentralized Nonconvex-Strongly-Concave Minimax Optimization

Distributed proximal‐gradient algorithms for nonsmooth convex optimization of second‐order multiagent systems

A Zeroth-Order Variance-Reduced Method for Decentralized Stochastic Non-convex Optimization

Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization

Variance-reduced accelerated methods for decentralized stochastic double-regularized nonconvex strongly-concave minimax problems