Stochastic Constrained Decentralized Optimization for Machine Learning with Fewer Data Oracles: a Gradient Sliding Approach

Hoang Huy Nguyen,Yan Li,Tuo Zhao
2024-04-03
Abstract:In modern decentralized applications, ensuring communication efficiency and privacy for the users are the key challenges. In order to train machine-learning models, the algorithm has to communicate to the data center and sample data for its gradient computation, thus exposing the data and increasing the communication cost. This gives rise to the need for a decentralized optimization algorithm that is communication-efficient and minimizes the number of gradient computations. To this end, we propose the primal-dual sliding with conditional gradient sliding framework, which is communication-efficient and achieves an $\varepsilon$-approximate solution with the optimal gradient complexity of $O(1/\sqrt{\varepsilon}+\sigma^2/{\varepsilon^2})$ and $O(\log(1/\varepsilon)+\sigma^2/\varepsilon)$ for the convex and strongly convex setting respectively and an LO (Linear Optimization) complexity of $O(1/\varepsilon^2)$ for both settings given a stochastic gradient oracle with variance $\sigma^2$. Compared with the prior work \cite{wai-fw-2017}, our framework relaxes the assumption of the optimal solution being a strict interior point of the feasible set and enjoys wider applicability for large-scale training using a stochastic gradient oracle. We also demonstrate the efficiency of our algorithms with various numerical experiments.
Optimization and Control,Machine Learning
What problem does this paper attempt to address?
The paper is primarily dedicated to addressing key challenges encountered in modern decentralized applications, specifically ensuring communication efficiency and user privacy. In particular, the research focuses on the phase of machine learning model training where algorithms need to communicate with data centers to obtain data for gradient computation. This not only exposes data information but also increases communication costs. Therefore, the goal of the paper is to propose a decentralized optimization algorithm that performs excellently in terms of communication efficiency and minimizes the number of gradient computations. To achieve the above goal, the authors propose a method that combines the primal-dual sliding with conditional gradient sliding framework. This method can find an ε-approximate solution with optimal gradient complexity, achieving gradient complexities of \(O(1/\sqrt{\epsilon}+\sigma^2/\epsilon^2)\) for convex functions and \(O(\log(1/\epsilon)+\sigma^2/\epsilon)\) for strongly convex functions, respectively. Additionally, this method has a linear optimization (LO) complexity of \(O(1/\epsilon^2)\). Compared to previous work, the newly proposed framework relaxes the assumption about the optimal solution within the feasible set, making it suitable for larger-scale data training. The contributions of the paper can be summarized as follows: 1. **Reducing Data Access Frequency**: By applying the conditional gradient sliding method to the linear optimization subproblem, better sampling complexity is achieved, thereby reducing the frequency of data storage access. 2. **Supporting Stochastic Gradients**: Unlike previous methods that required exact gradients, the new method allows the use of stochastic gradients, making the algorithm more robust and better suited for machine learning scenarios. 3. **Theoretical Analysis**: Provides linear optimization complexity analysis for convex and strongly convex functions, with complexities of \(O(1/\epsilon^2)\) for both cases. The experimental section validates the effectiveness and advantages of the proposed method, especially in handling convex smooth optimization problems, such as unregularized logistic regression models. By comparing different methods, the new method is shown to effectively reduce the need for gradient computations while maintaining communication efficiency, thereby improving overall performance.