Methods for Optimization Problems with Markovian Stochasticity and Non-Euclidean Geometry

Vladimir Solodkin,Andrew Veprikov,Aleksandr Beznosikov
2024-08-04
Abstract:This paper examines a variety of classical optimization problems, including well-known minimization tasks and more general variational inequalities. We consider a stochastic formulation of these problems, and unlike most previous work, we take into account the complex Markov nature of the noise. We also consider the geometry of the problem in an arbitrary non-Euclidean setting, and propose four methods based on the Mirror Descent iteration technique. Theoretical analysis is provided for smooth and convex minimization problems and variational inequalities with Lipschitz and monotone operators. The convergence guarantees obtained are optimal for first-order stochastic methods, as evidenced by the lower bound estimates provided in this paper.
Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to design effective algorithms in optimization problems with Markov randomness and non - Euclidean geometric structures. Specifically, the paper focuses on optimization problems in the context of complex noise, which depends on a specific background and thus does not satisfy the independent and identically distributed (i.i.d.) assumption. In addition, the paper also considers optimization problems in any non - Euclidean geometric setting, not just the traditional Euclidean space. ### Main Contributions 1. **New Algorithms**: - New algorithms based on the Mirror Descent (MD) and Mirror - Prox (MP) methods are proposed for minimization problems and variational inequality problems respectively. - These algorithms are analyzed in the general setting of any norm and compatible Bregman divergence, which is relatively rare in the case of Markov noise. 2. **Gradient Estimation Techniques**: - Two techniques for calculating gradient estimators are proposed: one uses batches (Algorithms 2 and 4), and the other does not use batches (Algorithms 1 and 3). - The paper provides lower bounds for Markov noise in minimization and variational inequality problems (Proposition 1 and 2), enabling Algorithms 2 and 4 with a batch size of only \(\tilde{O}(1)\) to obtain the optimal convergence rate (Theorem 2 and Theorem 4). 3. **Non - batch Version of the Algorithm**: - For the non - batch version of the algorithm for solving variational inequality problems (Algorithm 3), its convergence is proved under the condition that the variance is assumed to be bounded only under the expectation of the stationary distribution (Theorem 3). This is the first result achieved in the context of Markov noise. 4. **Deviation Bounds**: - As a by - product of the main results, the paper provides a new deviation bound for the realized real mean of geometrically ergodic Markov chains (Lemma 1), which holds under any norm. To the best of the authors' knowledge, this result has only been proven in the Euclidean setting before. ### Specific Problem Description The paper studies optimization problems of the following form: \[ f^* := \min_{x \in X} \{ f(x) := \mathbb{E}_{Z \sim \pi}[F(x, Z)] \} \] where \(\pi\) is a usually unknown distribution, \(X\) is a normed vector space with dual space \(X^*\) and a pair of primal and dual norms \(\|\cdot\|\) and \(\|\cdot\|^*\). Assume that \(\omega(\cdot)\) is a proper convex lower - semicontinuous function that is strongly convex with respect to \(\|\cdot\|\), then for any \(x \in X\), the Bregman divergence can be defined as: \[ V(x, y) := \omega(y) - \omega(x) - \langle \omega'(x), y - x \rangle, \quad \omega'(x) \in \partial \omega(x) \] ### Assumption Conditions 1. **Smoothness**: The function \(f\) is \(L\)-smooth with respect to the \(\|\cdot\|\) norm on \(X\), that is, there exists \(L>0\) such that for any \(x, y \in X\), we have: \[ \|\nabla f(x) - \nabla f(y)\|^* \leq L \|x - y\| \] 2. **Convexity**: The function \(f\) is convex on \(X\), that is, for any \(x, y \in X\), we have: \[ f(y) \leq f(x) + \langle \nabla f(x), y - x \rangle \] 3. **Markov Chain**: The noise variables \(\{Z_t\}_{t = 0}^\infty\) are a