Distributed reinforcement learning algorithm of operator service slice competition prediction based on zero-sum markov game

Guomin Wu,Guoping Tan,Jinxin Deng,Defu Jiang
DOI: https://doi.org/10.1016/j.neucom.2021.01.061
IF: 6
2021-06-01
Neurocomputing
Abstract:As a key enabling technology in the emerging network, network slicing can dynamically provide on-demand service with distinct logical slice instance. While most related studies have mainly focused on resource management, this study targets solving business competition between two operator slices using artificial intelligence. In this competition, each operator slice tries to maximize its own payoff, meanwhile its opponent strives to minimize it. Moreover, two operators update their marketing strategies over time. Therefore, predicting its result is a challenge. After the zero-sum Markov game is modeled for the research problem, we present the min-max Q learning algorithm. In each market state, each slice attains its temporary optimal strategy using the min-max algorithm. In the Markov decision process, Q value is dynamically modified under different market states, and the final Q value presents predictive result for this competition. Finally, a mass of numerical results prove that the min-max Q learning algorithm outperforms the repeated game, in which market state is invariable over time.
computer science, artificial intelligence
What problem does this paper attempt to address?