An Environmentally Sensitive Jamming Bandits Using Improved UCB Method

Yuzhuo Zheng,Jun Wang,Shaoqing Mao,Dongmei Han
DOI: https://doi.org/10.1109/icsp48669.2020.9321093
2020-01-01
Abstract:The problem of radar countermeasures can be formulated as a game between two intelligent agents including jammer and radar. Employing the online learning ability of reinforcement learning (RL) and the high real-time property of game theory which can predict the enemy's strategy before the war, an environmentally sensitive jamming bandits using improved upper confidence bound (UCB) method is proposed for jammer. This method can speed up convergence of UCB algorithm by exploiting prior information inferred from game theory, and improve the adaptability with respect to radar environment through clearing the prior information or enhancing the exploration intensity. The numerical experiments show that the performance of conventional reinforcement learning method will decrease rapidly when the radar strategy changes suddenly, however the proposed can adapt to the changes better.
What problem does this paper attempt to address?