IT Intrusion Detection Using Statistical Learning and Testbed Measurements
Xiaoxuan Wang,Rolf Stadler
2024-02-20
Abstract:We study automated intrusion detection in an IT infrastructure, specifically
the problem of identifying the start of an attack, the type of attack, and the
sequence of actions an attacker takes, based on continuous measurements from
the infrastructure. We apply statistical learning methods, including Hidden
Markov Model (HMM), Long Short-Term Memory (LSTM), and Random Forest Classifier
(RFC) to map sequences of observations to sequences of predicted attack
actions. In contrast to most related research, we have abundant data to train
the models and evaluate their predictive power. The data comes from traces we
generate on an in-house testbed where we run attacks against an emulated IT
infrastructure. Central to our work is a machine-learning pipeline that maps
measurements from a high-dimensional observation space to a space of low
dimensionality or to a small set of observation symbols. Investigating
intrusions in offline as well as online scenarios, we find that both HMM and
LSTM can be effective in predicting attack start time, attack type, and attack
actions. If sufficient training data is available, LSTM achieves higher
prediction accuracy than HMM. HMM, on the other hand, requires less
computational resources and less training data for effective prediction. Also,
we find that the methods we study benefit from data produced by traditional
intrusion detection systems like SNORT.
Machine Learning,Cryptography and Security