: An Adaptive Reinforcement Learning Strategy for the Security Game

Lisa Oakley,Alina Oprea
DOI: https://doi.org/10.1007/978-3-030-32430-8_22
2019-01-01
Abstract:A rise in Advanced Persistent Threats (APTs) has introduced a need for robustness against long-running, stealthy attacks which circumvent existing cryptographic security guarantees. is a security game that models attacker-defender interactions in advanced scenarios such as APTs. Previous work analyzed extensively non-adaptive strategies in , but adaptive strategies rise naturally in practical interactions as players receive feedback during the game. We model the game as a Markov Decision Process and introduce , an adaptive strategy for based on temporal difference reinforcement learning. We prove theoretical results on the convergence of our new strategy against an opponent playing with a Periodic strategy. We confirm our analysis experimentally by extensive evaluation of against specific opponents. converges to the optimal adaptive strategy for Periodic and Exponential opponents using associated state spaces. Finally, we introduce a generalized strategy with composite state space that outperforms a Greedy strategy for several distributions including Periodic and Uniform, without prior knowledge of the opponent’s strategy. We also release an OpenAI Gym environment for to facilitate future research.
What problem does this paper attempt to address?