Dynamic Regret of Distributed Online Saddle Point Problem
Wentao Zhang,Yang Shi,Baoyong Zhang,Deming Yuan,Shengyuan Xu
DOI: https://doi.org/10.1109/tac.2023.3312033
IF: 6.549
2024-01-01
IEEE Transactions on Automatic Control
Abstract:This paper is concerned with the distributed online saddle point problem for multi-agent networks over time-varying graphs. The objective is to design distributed online algorithms for optimizing the accumulation of the time-varying loss functions, and further to analyze the efficiency of the algorithms by utilizing the so-called dynamic regret as a performance measure. To this end, two kinds of distributed online optimization algorithms are developed by considering full-information feedback and bandit feedback, respectively. Under some certain assumptions and resorting to appropriate diminishing step sizes, it is proved that the two developed algorithms yield upper-bounds of the dynamic regret in order $\mathcal {O}(\max \lbrace T^{\lambda _{1}}, T^{\lambda _{2}} (V_{T}+1)\rbrace)$ and order $ \mathcal {O}(\max \lbrace k T^{\lambda _{3}}, k T^{\lambda _{4}} (V_{T}+1) \rbrace)$ , respectively, where $T$ is the horizon (total iteration number) of the algorithm, $V_{T}$ is the path-variation level, $k$ is the maximum dimension of the state variables, and $\lambda _{i}$ , $i=1,2,3,4$ , are positive tuning parameters less than 1. Clearly, both of the two developed algorithms can guarantee that the considered dynamic regret is sublinear with respect to $T$ , provided that the path-variation level $V_{T}$ is sublinear with respect to $T$ . This theoretically shows the efficiency of the developed algorithms. Moreover, the efficiency of the developed algorithms is also illustrated numerically by taking an example of a specific bilinear matrix game problem.