Online Cyber-Attack Detection in Smart Grid: A Reinforcement Learning Approach

Mehmet Necip Kurt,Oyetunji Ogundijo,Chong Li,Xiaodong Wang
DOI: https://doi.org/10.1109/TSG.2018.2878570
2018-09-14
Abstract:Early detection of cyber-attacks is crucial for a safe and reliable operation of the smart grid. In the literature, outlier detection schemes making sample-by-sample decisions and online detection schemes requiring perfect attack models have been proposed. In this paper, we formulate the online attack/anomaly detection problem as a partially observable Markov decision process (POMDP) problem and propose a universal robust online detection algorithm using the framework of model-free reinforcement learning (RL) for POMDPs. Numerical studies illustrate the effectiveness of the proposed RL-based algorithm in timely and accurate detection of cyber-attacks targeting the smart grid.
Machine Learning,Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the online detection of cyber - attacks in smart grids. Specifically, the author aims to develop a method that can detect cyber - attacks against smart grids in a timely and accurate manner. The following are the specific problems that the paper attempts to solve: 1. **Importance of Early Detection**: - Smart grids rely on advanced control and communication technologies, which makes them vulnerable to malicious cyber - attacks. - Early detection of these attacks is crucial for ensuring the safe and reliable operation of smart grids, because any malfunction or abnormality in any part may cause significant damage to the entire system. 2. **Limitations of Existing Methods**: - Outlier detection schemes in the literature are usually based on sample - by - sample decision - making, while online detection schemes require accurate attack models. - These methods have limitations in practical applications, because the attacker's strategies and capabilities may be completely unknown and cannot be accurately modeled in advance. 3. **Proposed New Method**: - This paper models the online attack/anomaly detection problem as a partially observable Markov decision process (POMDP) problem and proposes a general robust online detection algorithm based on model - free reinforcement learning (RL). - This algorithm does not need to know the attack model in advance, thus making it widely applicable and forward - looking and capable of detecting new, unknown types of attacks. 4. **Objectives**: - The objective is to minimize the average detection delay and false alarm rate while making accurate decisions between normal operation and after an attack occurs. - By adjusting the cost weights of different events (such as false alarms and detection delays), the optimal balance between detection speed and accuracy can be found. ### Summary of Mathematical Formulas - **State - Space Equations**: \[ x_t = A x_{t - 1}+v_t \] \[ y_t = H x_t + w_t \] where \(x_t\) is the system state, \(y_t\) is the measured value, \(A\) is the system matrix, \(H\) is the measurement matrix, and \(v_t\) and \(w_t\) are process noise and measurement noise respectively. - **Kalman Filter Update Steps**: - Prediction Step: \[ \hat{x}_{t|t - 1}=A \hat{x}_{t - 1|t - 1} \] \[ F_{t|t - 1}=A F_{t - 1|t - 1}A^T+\sigma_v^2I_N \] - Measurement Update Step: \[ G_t = F_{t|t - 1}H^T(H F_{t|t - 1}H^T+\sigma_w^2I_K)^{-1} \] \[ \hat{x}_{t|t}=\hat{x}_{t|t - 1}+G_t(y_t - H \hat{x}_{t|t - 1}) \] \[ F_{t|t}=F_{t|t - 1}-G_tH F_{t|t - 1} \] - **Negative Log - Likelihood Estimation**: \[ \eta_t=(y_t - H \hat{x}_{t|t})^T(y_t - H \hat{x}_{t|t}) \] Through these formulas, the author constructs a framework that uses reinforcement learning to handle the attack detection problem in an uncertain environment, thereby improving the accuracy and robustness of detection.