AUTOMA: Automated Generation of Attack Hypotheses and Their Variants for Threat Hunting Using Knowledge Discovery
Boubakr Nour,Makan Pourzandi,Rushaan Kamran Qureshi,Mourad Debbabi
DOI: https://doi.org/10.1109/tnsm.2024.3378972
2024-01-01
IEEE Transactions on Network and Service Management
Abstract:Threat hunting is a proactive security defense line exercised to uncover attacks that could circumvent conventional detection mechanisms. It is based on an iterative approach to generate, inspect, and revise attack hypotheses. The quality of these hypotheses is essential to prove/refute the existence of an attack. Today, attack hypotheses are often generated manually by security analysts. The generation process requires elusive expertise, is costly, and is prone to produce a large number of irrelevant hypotheses without considering the attack variants. In this paper, we address the aforementioned challenges by designing AUTOMA, a solution that automates the generation of relevant hypotheses and their variants using knowledge discovery. AUTOMA incorporates the system telemetry in combination with a knowledge base of existing attacks, techniques, and their relationships to mine the most relevant hypotheses. In order to increase the relevance of the generated hypotheses, AUTOMA examines these hypotheses by applying matching-based similarity, success, likelihood, and criticality evaluations. These evaluations are based on the past occurrences of the techniques part of a hypothesis in the system telemetry and the knowledge base. Additionally, AUTOMA uses sequence success, sequence alignment, and hierarchical similarity approach for generating potential attack variants of a hypothesis taking into account the dynamism and stealthiness of attackers in coming up with alternative attack steps. We extensively evaluate the effectiveness and efficiency of AUTOMA using a real dataset for 284 attack campaigns distributed over 57 advanced persistent threats. The obtained results show that AUTOMA is able to generate the relevant hypothesis (top 3), with a large reduction rate (up to 99%), and fast execution time (up to 8 minutes for proposing the relevant hypothesis and 10 seconds for variants generation).
computer science, information systems