Deep Reinforcement Learning for Autonomous Cyber Defence: A Survey

Gregory Palmer,Chris Parry,Daniel J.B. Harrold,Chris Willis
2024-09-27
Abstract:The rapid increase in the number of cyber-attacks in recent years raises the need for principled methods for defending networks against malicious actors. Deep reinforcement learning (DRL) has emerged as a promising approach for mitigating these attacks. However, while DRL has shown much potential for cyber defence, numerous challenges must be overcome before DRL can be applied to the autonomous cyber defence (ACD) problem at scale. Principled methods are required for environments that confront learners with very high-dimensional state spaces, large multi-discrete action spaces, and adversarial learning. Recent works have reported success in solving these problems individually. There have also been impressive engineering efforts towards solving all three for real-time strategy games. However, applying DRL to the full ACD problem remains an open challenge. Here, we survey the relevant DRL literature and conceptualize an idealised ACD-DRL agent. We provide: i.) A summary of the domain properties that define the ACD problem; ii.) A comprehensive comparison of current ACD environments used for benchmarking DRL approaches; iii.) An overview of state-of-the-art approaches for scaling DRL to domains that confront learners with the curse of dimensionality, and; iv.) A survey and critique of current methods for limiting the exploitability of agents within adversarial settings from the perspective of ACD. We conclude with open research questions that we hope will motivate future directions for researchers and practitioners working on ACD.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is, in the context of the sharp increase in the number of cyber - attacks in recent years, how to use deep reinforcement learning (DRL) to build a responsive, adaptable and scalable autonomous cyber - defense (ACD) system. Specifically, the paper mainly focuses on the following three challenges: 1. **Effective handling and exploration of high - dimensional state spaces**: In the network defense scenario, the state space is usually very large and complex, which places high demands on the learning efficiency of DRL models. 2. **Large - scale combinatorial action spaces**: Network defense involves a wide variety of actions, and there may be complex combinatorial relationships among these actions, increasing the difficulty of decision - making. 3. **Minimizing the exploitability of DRL agents in an adversarial environment**: Cyber - attackers may attack the weaknesses of DRL systems, so it is necessary to ensure that DRL agents have strong robustness and security in the face of malicious behavior. To address the above challenges, the paper reviews the existing DRL literature and proposes a conceptual framework for an idealized ACD - DRL agent. In addition, the author also provides a comprehensive comparison of the current ACD environments used for benchmarking DRL methods, as well as an overview of methods on how to extend DRL to handle high - dimensional state spaces and large - scale action spaces. Finally, from the perspective of adversarial learning, the paper evaluates the existing methods for limiting the exploitation of agents in adversarial settings and points out the direction for future research. Through these efforts, the paper aims to provide researchers and practitioners with a systematic perspective to help them better understand how to use DRL technology to solve practical ACD problems.