Time is of the Essence: Machine Learning-based Intrusion Detection in Industrial Time Series Data

Simon Duque Anton,Lia Ahrens,Daniel Fraunholz,Hans Dieter Schotten
DOI: https://doi.org/10.1109/ICDMW.2018.00008
2018-09-20
Abstract:The Industrial Internet of Things drastically increases connectivity of devices in industrial applications. In addition to the benefits in efficiency, scalability and ease of use, this creates novel attack surfaces. Historically, industrial networks and protocols do not contain means of security, such as authentication and encryption, that are made necessary by this development. Thus, industrial IT-security is needed. In this work, emulated industrial network data is transformed into a time series and analysed with three different algorithms. The data contains labeled attacks, so the performance can be evaluated. Matrix Profiles perform well with almost no parameterisation needed. Seasonal Autoregressive Integrated Moving Average performs well in the presence of noise, requiring parameterisation effort. Long Short Term Memory-based neural networks perform mediocre while requiring a high training- and parameterisation effort.
Machine Learning,Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to use machine - learning methods for intrusion detection on time - series data in the Industrial Internet of Things (IIoT) environment. With the increase in the interconnectivity of industrial devices, although efficiency, scalability and ease - of - use have been improved, new attack surfaces have also been introduced. Historically, industrial networks and protocols lack necessary security measures such as authentication and encryption, so industrial IT security solutions are required. Specifically, the paper mainly focuses on the following issues: 1. **Security threats in industrial networks**: Traditional industrial control systems (such as SCADA systems) were initially physically isolated and had unique network characteristics, making it difficult for attackers to effectively utilize these characteristics. However, with the introduction of commercial off - the - shelf (COTS) products and standardized software and hardware, the uniqueness of industrial systems has decreased, the attack surface has expanded, and reusable attack means have become more likely. 2. **Lack of intrusion detection systems (IDS) suitable for industrial environments**: Unlike home and office environments, the communication patterns in industrial environments have their own uniqueness, and existing IDS cannot be directly applied to industrial scenarios. In addition, the data available for testing industrial IDS applications is scarce, which also limits the development of related research. 3. **Application of time - series analysis in intrusion detection**: Given the characteristics of industrial network communication, time - series analysis is an effective means of intrusion detection. However, how to select appropriate time - series algorithms to detect anomalies and accurately identify attacks is a challenge. To solve these problems, the paper selects three different time - series anomaly detection algorithms - Matrix Profiles, Seasonal Autoregressive Integrated Moving Average model (SARIMA) and Long Short - Term Memory network (LSTM), and evaluates them on an industrial data set based on the Modbus/TCP protocol. Through these methods, the paper aims to find a scheme that can detect intrusion behaviors in industrial networks efficiently and accurately. ### Formula summary - **Distance calculation in Matrix Profiles**: \[ d(x, y)=\sqrt{\frac{2m(1 - \text{corr}(x, y))}{m}} \] where, \[ \text{corr}(x, y)=\frac{\sum_{i = 1}^{m}x_iy_i - m\mu_x\mu_y}{m\sigma_x\sigma_y} \] \[ \mu_x=\frac{\sum_{i = 1}^{m}x_i}{m}, \quad \mu_y=\frac{\sum_{i = 1}^{m}y_i}{m} \] \[ \sigma_x^2=\frac{\sum_{i = 1}^{m}x_i^2}{m}-\mu_x^2, \quad \sigma_y^2=\frac{\sum_{i = 1}^{m}y_i^2}{m}-\mu_y^2 \] - **SARIMA model**: \[ Y_t=(1 - U^{-1})^d(1 - U^{-s})^D X_t \] \[ A(U^{-1})F(U^{-s})Y_t = D(U^{-1})G(U^{-s})\epsilon_t \] where, \(\epsilon_t\) is a white - noise process and \(U\) is a shift operator. Through these formulas and methods, the paper shows how to effectively detect intrusion behaviors in industrial time - series data.