Nip in the Bud: Forecasting and Interpreting Post-exploitation Attacks in Real-time through Cyber Threat Intelligence Reports

Tiantian Zhu,Jie Ying,Tieming Chen,Chunlin Xiong,Wenrui Cheng,Qixuan Yuan,Aohan Zheng,Mingqi Lv,Yan Chen
2024-05-05
Abstract:Advanced Persistent Threat (APT) attacks have caused significant damage worldwide. Various Endpoint Detection and Response (EDR) systems are deployed by enterprises to fight against potential threats. However, EDR suffers from high false positives. In order not to affect normal operations, analysts need to investigate and filter detection results before taking countermeasures, in which heavy manual labor and alarm fatigue cause analysts miss optimal response time, thereby leading to information leakage and destruction. Therefore, we propose Endpoint Forecasting and Interpreting (EFI), a real-time attack forecast and interpretation system, which can automatically predict next move during post-exploitation and explain it in technique-level, then dispatch strategies to EDR for advance reinforcement. First, we use Cyber Threat Intelligence (CTI) reports to extract the attack scene graph (ASG) that can be mapped to low-level system logs to strengthen attack samples. Second, we build a serialized graph forecast model, which is combined with the attack provenance graph (APG) provided by EDR to generate an attack forecast graph (AFG) to predict the next move. Finally, we utilize the attack template graph (ATG) and graph alignment plus algorithm for technique-level interpretation to automatically dispatch strategies for EDR to reinforce system in advance. EFI can avoid the impact of existing EDR false positives, and can reduce the attack surface of system without affecting the normal operations. We collect a total of 3,484 CTI reports, generate 1,429 ASGs, label 8,000 sentences, tag 10,451 entities, and construct 256 ATGs. Experimental results on both DARPA Engagement and large scale CTI dataset show that the alignment score between the AFG predicted by EFI and the real attack graph is able to exceed 0.8, the forecast and interpretation precision of EFI can reach 91.8%.
Cryptography and Security
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: in the Endpoint Detection and Response (EDR) system, how to reduce false positives, improve response speed and reduce the system's attack surface by real - time prediction and interpretation of the attacker's next move after exploiting vulnerabilities. Specifically, the paper aims to solve the following four main problems: 1. **Insufficient APT attack samples**: Due to the complexity of APT attacks, it is very time - consuming to simulate different attack techniques and collect corresponding logs. 2. **Semantic gap**: There is a large semantic gap between CTI reports (natural language descriptions) and low - level system logs, making it difficult to automatically extract accurate Attack Scenario Graphs (ASG). 3. **Difficulty in real - time attack prediction**: Relying on the experience of security analysts to judge the attacker's next move is not accurate enough, and the diversity of attack techniques increases the difficulty of decision - making. 4. **Lack of technical - level explanations for attack interpretations**: The attack prediction results based on learning models usually lack technical - level explanations, making it difficult for analysts to directly use these prediction results to take countermeasures. To solve these problems, the authors propose a real - time attack prediction and interpretation system named EFI (Endpoint Forecasting and Interpreting). The main contributions of this system include: - **Realize automated real - time attack prediction and interpretation for the first time**, which can avoid false positives in existing EDR systems and significantly reduce the system's attack surface without affecting normal operations. - **Construct an ASG extraction module**, which can massively abstract ASG from CTI reports through a heuristic natural language processing (NLP) pipeline to bridge the semantic gap. - **Construct an AFG generation module**, which combines a serialized graph prediction model to achieve sub - graph prediction while capturing node attributes, edge attributes and the time sequence of edges to maximize prediction accuracy. - **Construct ATG based on atomic red team techniques** and propose an innovative graph alignment plus algorithm to provide technical - level explanations for AFG, facilitating EDR systems to strengthen defenses in advance. ### Formula presentation 1. **Similarity calculation formula**: \[ \text{Sim}(N, M)=\frac{\text{sim}(N_{\text{name}}, M_{\text{name}})-|N_{\text{index}} - M_{\text{index}}|}{W_d}-\frac{|N_{\text{type}} - M_{\text{type}}|}{W_t} \] where \( N \) and \( M \) are two entities, and \( W_d \) and \( W_t \) are preset weights. 2. **Probability distribution of the graph prediction model**: \[ p(S_\pi, C_\pi)=\prod_{i = 1}^{n + 1}p(C_\pi^i|S_\pi^{<i}, C_\pi^{<i})p(S_\pi^i|C_\pi^i, S_\pi^{<i}, C_\pi^{<i}) \] where \( S_\pi \) and \( C_\pi \) represent the adjacency vectors and types of nodes respectively, and \( \pi \) represents the permutation order of nodes. Through these methods and techniques, the EFI system can predict the attacker's next move before the attack occurs and provide detailed technical explanations, thus helping enterprises and security analysts respond more effectively to APT attacks.