Abstract:Cyber-attacks pose a security threat to military command and control networks, Intelligence, Surveillance, and Reconnaissance (ISR) systems, and civilian critical national infrastructure. The use of artificial intelligence and autonomous agents in these attacks increases the scale, range, and complexity of this threat and the subsequent disruption they cause. Autonomous Cyber Defence (ACD) agents aim to mitigate this threat by responding at machine speed and at the scale required to address the problem. Sequential decision-making algorithms such as Deep Reinforcement Learning (RL) provide a promising route to create ACD agents. These algorithms focus on a single objective such as minimizing the intrusion of red agents on the network, by using a handcrafted weighted sum of rewards. This approach removes the ability to adapt the model during inference, and fails to address the many competing objectives present when operating and protecting these networks. Conflicting objectives, such as restoring a machine from a back-up image, must be carefully balanced with the cost of associated down-time, or the disruption to network traffic or services that might result. Instead of pursing a Single-Objective RL (SORL) approach, here we present a simple example of a multi-objective network defence game that requires consideration of both defending the network against red-agents and maintaining critical functionality of green-agents. Two Multi-Objective Reinforcement Learning (MORL) algorithms, namely Multi-Objective Proximal Policy Optimization (MOPPO), and Pareto-Conditioned Networks (PCN), are used to create two trained ACD agents whose performance is compared on our Multi-Objective Cyber Defence game. The benefits and limitations of MORL ACD agents in comparison to SORL ACD agents are discussed based on the investigations of this game.

: An Adaptive Reinforcement Learning Strategy for the Security Game

QFlip: an Adaptive Reinforcement Learning Strategy for the FlipIt Security Game

Finding Effective Security Strategies through Reinforcement Learning and Self-Play

A Dynamic Games Approach to Proactive Defense Strategies against Advanced Persistent Threats in Cyber-Physical Systems

Adaptive Strategic Cyber Defense for Advanced Persistent Threats in Critical Infrastructure Networks

Adaptive Attacker Strategy Development Against Moving Target Cyber Defenses

Learning Security Strategies through Game Play and Optimal Stopping

Adversarial Decision-Making for Moving Target Defense: A Multi-Agent Markov Game and Reinforcement Learning Approach

An Improved Approach Towards Multi-Agent Pursuit–Evasion Game Decision-Making Using Deep Reinforcement Learning

Learning Adversary Behavior in Security Games: A PAC Model Perspective

Optimizing Cyber Defense in Dynamic Active Directories through Reinforcement Learning

Multi-Objective Reinforcement Learning for Automated Resilient Cyber Defence

Adversarial Online Learning with Variable Plays in the Pursuit-Evasion Game: Theoretical Foundations and Application in Connected and Automated Vehicle Cybersecurity

Dynamic Defense Strategy Against Advanced Persistent Threat under Heterogeneous Networks

A method of network attack-defense game and collaborative defense decision-making based on hierarchical multi-agent reinforcement learning

Security Defense Strategy Algorithm for Internet of Things Based on Deep Reinforcement Learning

Learning Near-Optimal Intrusion Responses Against Dynamic Attackers

Defending Against Advanced Persistent Threats using Game-Theory

A Game-Theoretical Self-Adaptation Framework for Securing Software-Intensive Systems

Adversarial Deep Reinforcement Learning for Cyber Security in Software Defined Networks

Reinforcement Learning vs Genetic Algorithms in Game-Theoretic Cyber-Security