Incorporating Gradients to Rules: Towards Lightweight, Adaptive Provenance-based Intrusion Detection

Lingzhi Wang,Xiangmin Shen,Weijian Li,Zhenyuan Li,R. Sekar,Han Liu,Yan Chen
DOI: https://doi.org/10.14722/ndss.2025.23822
2024-09-20
Abstract:As cyber attacks grow increasingly sophisticated and stealthy, it becomes more imperative and challenging to detect intrusion from normal behaviors. Through fine-grained causality analysis, provenance-based intrusion detection systems (PIDS) demonstrated a promising capacity to distinguish benign and malicious behaviors, attracting widespread attention from both industry and academia. Among diverse approaches, rule-based PIDS stands out due to its lightweight overhead, real-time capabilities, and explainability. However, existing rule-based systems suffer low detection accuracy, especially the high false alarms, due to the lack of fine-grained rules and environment-specific configurations. In this paper, we propose CAPTAIN, a rule-based PIDS capable of automatically adapting to diverse environments. Specifically, we propose three adaptive parameters to adjust the detection configuration with respect to nodes, edges, and alarm generation thresholds. We build a differentiable tag propagation framework and utilize the gradient descent algorithm to optimize these adaptive parameters based on the training data. We evaluate our system using data from DARPA Engagements and simulated environments. The evaluation results demonstrate that CAPTAIN enhances rule-based PIDS with learning capabilities, resulting in improved detection accuracy, reduced detection latency, lower runtime overhead, and more interpretable detection procedures and results compared to the state-of-the-art (SOTA) PIDS.
Cryptography and Security
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the low detection accuracy, high false - positive rate, and lack of environmental adaptability faced by existing rule - based intrusion detection systems (PIDS) in actual deployments. Specifically: 1. **Low Detection Accuracy and High False - Positive Rate**: Existing rule - based PIDS are unable to adapt flexibly to different environments due to overly simple and general rules, resulting in detection results that are either too lax (generating a large number of false positives) or too strict (missing real attacks). For example, when dealing with "gray" nodes in cloud services (such as the IP addresses of FaaS platforms), these systems are unable to effectively distinguish between benign and malicious behaviors, leading to false positives or false negatives. 2. **Lack of Environmental Adaptability**: Traditional rule - based PIDS usually rely on static configurations and manual adjustments, and it is difficult to dynamically adjust rules according to the specific environment. This makes the system perform poorly in the face of complex and changeable network environments, especially in security operation centers (SOCs), where analysts need to spend a great deal of time manually configuring models. To solve these problems, the paper proposes a new rule - based PIDS named C APTAIN, which automatically adjusts the rule configuration by introducing adaptive parameters and the gradient - descent optimization algorithm, thereby improving detection accuracy and reducing the false - positive rate. Specifically, C APTAIN introduces three adaptive parameters: - **Label Initialization Parameter (A)**: Used to determine the initial labels of system entities. - **Label Propagation Rate Parameter (G)**: Used to adjust the impact of system events on labels. - **Alarm Generation Threshold Parameter (T)**: Used to adjust the alarm generation rules. Through these adaptive parameters, C APTAIN can automatically learn and optimize the rule configuration during the training process, thereby achieving more accurate detection and response. In addition, C APTAIN also retains the advantages of rule - based PIDS, such as being lightweight, having low latency, and being interpretable. ### Main Contributions of the Paper 1. **Proposing C APTAIN**: A rule - based PIDS that can automatically adjust rules, combining the advantages of traditional rule - based systems (lightweight, low latency, interpretability) and the adaptive capabilities of machine - learning systems. 2. **Designing a Differentiable Label Propagation Framework**: Transforming the rule - based PIDS into a differentiable function and using the gradient - descent algorithm to optimize the adaptive parameters, thereby reducing false positives. 3. **System Evaluation**: Evaluating the performance of C APTAIN in multiple scenarios, including the DARPA data set and the simulated - environment data set. The experimental results show that C APTAIN reduces false positives by more than 90% compared to traditional rule - based PIDS and performs well in terms of detection accuracy, runtime overhead, and latency. Through these improvements, C APTAIN aims to overcome the limitations of existing rule - based PIDS and provide a more flexible and accurate intrusion - detection solution.