Learning with Guarantee via Constrained Multi-armed Bandit: Theory and Network Applications
Kechao Cai,Xutong Liu,Yu-Zhen Janice Chen,John C.S. Lui
DOI: https://doi.org/10.1109/tmc.2022.3173792
IF: 6.075
2022-01-01
IEEE Transactions on Mobile Computing
Abstract:There have been studies that consider optimizing network applications in an online learning context using multi-armed bandit models. However, existing frameworks are problematic as they only consider finding the optimal decisions to minimize the regret, but neglect the constraints(or guarantee) requirements that may be excessively violated. In this paper, we formulate the stochastic constrained multi-armed bandit model with either “time-varying” or “stochastic” multi-level rewards for network application optimizations with guarantee by taking both regret and violation into consideration. Alongside this model, we design two constrained multi-armed bandit policies, Learning with Guarantee with time-Varying rewards (LG-V) and Learning with Guarantee with Stochastic rewards (LG-S), with provable sub-linear regret and violation bounds. Moreover, we illustrate how our policies can be applied to several emerging network application optimizations, namely, (1) opportunistic multichannel selection, (2) data-guaranteed mobile crowdsensing, and (3) stability-guaranteed crowdsourced transcoding. To show the effectiveness of LG-V and LG-S in optimizing these applications with different requirements, we also conduct extensive simulations by comparing both LG-V and LG-S with existing state-of-the-art policies. We also show the impact of parameter variations, namely, the variations of the guarantee threshold and the number of selected arms, on the regrets and violations of LG-V and LG-S.
computer science, information systems,telecommunications