Abstract:While Endpoint Detection and Response (EDR) are able to efficiently monitor threats by comparing static rules to the event stream, their inability to incorporate past system context leads to high rates of false alarms. Recent work has demonstrated Provenance-based Intrusion Detection Systems (Prov-IDS) that can examine the causal relationships between abnormal behaviors to improve threat classification. However, employing these Prov-IDS in practical settings remains difficult -- state-of-the-art neural network based systems are only fast in a fully offline deployment model that increases attacker dwell time, while simultaneously using simplified and less accurate provenance graphs to reduce memory consumption. Thus, today's Prov-IDS cannot operate effectively in the real-time streaming setting required for commercial EDR viability. This work presents the design and implementation of ORCHID, a novel Prov-IDS that performs fine-grained detection of process-level threats over a real time event stream. ORCHID takes advantage of the unique immutable properties of a versioned provenance graphs to iteratively embed the entire graph in a sequential RNN model while only consuming a fraction of the computation and memory costs. We evaluate ORCHID on four public datasets, including DARPA TC, to show that ORCHID can provide competitive classification performance while eliminating detection lag and reducing memory consumption by two orders of magnitude.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the limitations in practical applications of existing Provenance - based intrusion detection systems (Prov - IDS). Specifically: 1. **High false positive rate**: Traditional Endpoint Detection and Response (EDR) systems efficiently monitor threats by comparing event streams with static rules, but they are unable to incorporate past system contexts, resulting in a relatively high false positive rate. 2. **Insufficient real - time processing ability**: Current Prov - IDS systems can analyze causal relationships to improve threat classification, but they face challenges in actual deployment. The state - of - the - art neural - network - based Prov - IDS systems can only run quickly in a fully offline mode, which increases the attacker's dwell time and uses simplified and inaccurate provenance graphs to reduce memory consumption. 3. **High memory and computing resource consumption**: Existing Prov - IDS systems require a large amount of memory and computing resources to store and analyze complete provenance graphs. For example, analyzing a versioned provenance graph requires 143.7 GB of memory. To solve these problems, the paper proposes ORCHID (Online Root Cause Host Intrusion Detection System), a new type of Prov - IDS, aiming to achieve real - time fine - grained process - level threat detection. The main innovations of ORCHID include: - **Real - time embedding and classification**: ORCHID takes advantage of the immutable characteristics of versioned provenance graphs and iteratively embeds the entire graph through a Recurrent Neural Network (RNN) model while consuming only a small fraction of the computing and memory costs. - **Low memory footprint**: Compared with existing methods, ORCHID only maintains the latest version of each system entity, thereby significantly reducing the memory footprint. - **Long - dependency capture**: By introducing "root node" embedding, ORCHID is able to capture long - term dependency relationships and enhance the ability to recognize attack behaviors. The paper proves through the evaluation of four public datasets (including DARPA TC) that ORCHID eliminates the detection delay and reduces the memory consumption by two orders of magnitude while maintaining competitive classification performance. ### Formula presentation ORCHID uses the following formula for the embedding update of system entities: \[ D[v_j]=f(D[v_j], D[v_i]) \] where \(D\) is an internal dictionary that maps each vertex to its latest embedding, and \(f\) is an RNN model. To capture long - term dependency relationships, ORCHID modifies the RNN update function and introduces "root node" embedding: \[ h_i = w*(h_{i - 1})+b*(x_i)+c*\left[\frac{1}{n}\sum_{i = 0}^{n}r_i\right] \] where \(\{r_i\}\) is the set of root nodes associated with element \(i\) in the sequence, and \(c\) is a learnable model weight used to balance the information introduced by the root embedding. Through these innovations, ORCHID achieves efficient real - time threat detection and significantly reduces resource consumption.

ORCHID: Streaming Threat Detection over Versioned Provenance Graphs

NODLINK: An Online System for Fine-Grained APT Attack Detection and Investigation

TAGS: Real-time Intrusion Detection with Tag-Propagation-based Provenance Graph Alignment on Streaming Events

Prov2vec: Learning Provenance Graph Representation for Unsupervised APT Detection

Provenance-based Intrusion Detection: Opportunities and Challenges

Threat detection and investigation with system-level provenance graphs: A survey

TBDetector:Transformer-Based Detector for Advanced Persistent Threats with Provenance Graph

Exploiting the Outcome of Outlier Detection for Novel Attack Pattern Recognition on Streaming Data

DISTDET: A Cost-Effective Distributed Cyber Threat Detection System

A PT-based Approach to Construct Efficient Provenance Graph for Threat Alert Investigation

ANUBIS: A Provenance Graph-Based Framework for Advanced Persistent Threat Detection

ROP Defense Using Trie Graph for System Security.

threaTrace: Detecting and Tracing Host-based Threats in Node Level Through Provenance Graph Learning

APT-KGL: an Intelligent APT Detection System Based on Threat Knowledge and Heterogeneous Provenance Graph Learning

Winemaking: Extracting Essential Insights for Efficient Threat Detection in Audit Logs

Work-in-Progress: Towards Real-Time IDS Via RNN and Programmable Switches Co-Designed Approach

You Are What You Do: Hunting Stealthy Malware Via Data Provenance Analysis

Sequence Feature Extraction-Based APT Attack Detection Method with Provenance Graphs

Marlin: Knowledge-Driven Analysis of Provenance Graphs for Efficient and Robust Detection of Cyber Attacks

Combating Advanced Persistent Threats: Challenges and Solutions

Flurry: a Fast Framework for Reproducible Multi-layered Provenance Graph Representation Learning