Stochastic Constrained Decentralized Optimization for Machine Learning with Fewer Data Oracles: a Gradient Sliding Approach

Hoang Huy Nguyen,Yan Li,Tuo Zhao

2024-04-03

Abstract:In modern decentralized applications, ensuring communication efficiency and privacy for the users are the key challenges. In order to train machine-learning models, the algorithm has to communicate to the data center and sample data for its gradient computation, thus exposing the data and increasing the communication cost. This gives rise to the need for a decentralized optimization algorithm that is communication-efficient and minimizes the number of gradient computations. To this end, we propose the primal-dual sliding with conditional gradient sliding framework, which is communication-efficient and achieves an $\varepsilon$-approximate solution with the optimal gradient complexity of $O(1/\sqrt{\varepsilon}+\sigma^2/{\varepsilon^2})$ and $O(\log(1/\varepsilon)+\sigma^2/\varepsilon)$ for the convex and strongly convex setting respectively and an LO (Linear Optimization) complexity of $O(1/\varepsilon^2)$ for both settings given a stochastic gradient oracle with variance $\sigma^2$. Compared with the prior work \cite{wai-fw-2017}, our framework relaxes the assumption of the optimal solution being a strict interior point of the feasible set and enjoys wider applicability for large-scale training using a stochastic gradient oracle. We also demonstrate the efficiency of our algorithms with various numerical experiments.

Optimization and Control,Machine Learning

What problem does this paper attempt to address?

The paper is primarily dedicated to addressing key challenges encountered in modern decentralized applications, specifically ensuring communication efficiency and user privacy. In particular, the research focuses on the phase of machine learning model training where algorithms need to communicate with data centers to obtain data for gradient computation. This not only exposes data information but also increases communication costs. Therefore, the goal of the paper is to propose a decentralized optimization algorithm that performs excellently in terms of communication efficiency and minimizes the number of gradient computations. To achieve the above goal, the authors propose a method that combines the primal-dual sliding with conditional gradient sliding framework. This method can find an ε-approximate solution with optimal gradient complexity, achieving gradient complexities of $O(1/\sqrt{\epsilon}+\sigma^2/\epsilon^2)$ for convex functions and $O(\log(1/\epsilon)+\sigma^2/\epsilon)$ for strongly convex functions, respectively. Additionally, this method has a linear optimization (LO) complexity of $O(1/\epsilon^2)$. Compared to previous work, the newly proposed framework relaxes the assumption about the optimal solution within the feasible set, making it suitable for larger-scale data training. The contributions of the paper can be summarized as follows: 1. **Reducing Data Access Frequency**: By applying the conditional gradient sliding method to the linear optimization subproblem, better sampling complexity is achieved, thereby reducing the frequency of data storage access. 2. **Supporting Stochastic Gradients**: Unlike previous methods that required exact gradients, the new method allows the use of stochastic gradients, making the algorithm more robust and better suited for machine learning scenarios. 3. **Theoretical Analysis**: Provides linear optimization complexity analysis for convex and strongly convex functions, with complexities of $O(1/\epsilon^2)$ for both cases. The experimental section validates the effectiveness and advantages of the proposed method, especially in handling convex smooth optimization problems, such as unregularized logistic regression models. By comparing different methods, the new method is shown to effectively reduce the need for gradient computations while maintaining communication efficiency, thereby improving overall performance.

Stochastic Constrained Decentralized Optimization for Machine Learning with Fewer Data Oracles: a Gradient Sliding Approach

Decentralized Stochastic Optimization with Inherent Privacy Protection

An Accelerated Decentralized Stochastic Optimization Algorithm with Inexact Model

Communication-efficient algorithms for decentralized and stochastic optimization

Asynchronous Decentralized Accelerated Stochastic Gradient Descent

On the Communication Complexity of Decentralized Bilevel Optimization

Decentralized Learning with Lazy and Approximate Dual Gradients

An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum Optimization

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: A Joint Gradient Estimation and Tracking Approach

Gradient tracking and variance reduction for decentralized optimization and machine learning

Gradient-tracking based Distributed Optimization with Guaranteed Optimality under Noisy Information Sharing

A Flexible Gradient Tracking Algorithmic Framework for Decentralized Optimization

Convergence and Privacy of Decentralized Nonconvex Optimization with Gradient Clipping and Communication Compression

Decentralized Stochastic Subgradient Methods for Nonsmooth Nonconvex Optimization

Stochastic, Distributed and Federated Optimization for Machine Learning

Byzantine-robust decentralized stochastic optimization with stochastic gradient noise-independent learning error

Decentralized Nonconvex Optimization with Guaranteed Privacy and Accuracy

Multi-consensus Decentralized Accelerated Gradient Descent

Gradient Sparsification for Communication-Efficient Distributed Optimization

The Minimax Complexity of Distributed Optimization

A Communication-efficient Linearly Convergent Algorithm with Variance Reduction for Distributed Stochastic Optimization