Machine Learning to detect cyber-attacks and discriminating the types of power system disturbances

Diane Tuyizere,Remy Ihabwikuzo
2023-07-07
Abstract:This research proposes a machine learning-based attack detection model for power systems, specifically targeting smart grids. By utilizing data and logs collected from Phasor Measuring Devices (PMUs), the model aims to learn system behaviors and effectively identify potential security boundaries. The proposed approach involves crucial stages including dataset pre-processing, feature selection, model creation, and evaluation. To validate our approach, we used a dataset used, consist of 15 separate datasets obtained from different PMUs, relay snort alarms and logs. Three machine learning models: Random Forest, Logistic Regression, and K-Nearest Neighbour were built and evaluated using various performance metrics. The findings indicate that the Random Forest model achieves the highest performance with an accuracy of 90.56% in detecting power system disturbances and has the potential in assisting operators in decision-making processes.
Machine Learning,Systems and Control
What problem does this paper attempt to address?
This paper attempts to solve the cybersecurity problems in the smart grid. Specifically, it detects cyber - attacks in the power system and distinguishes different types of power system disturbances through machine - learning techniques. The following is a summary of the main objectives and contents of the paper: ### Research Background and Problems With the wide application of the smart grid, power systems are increasingly relying on automation, communication, and information technology systems. Although the introduction of these systems has improved efficiency and reliability, it has also brought new security threats. Hackers can cause power supply interruptions by attacking these systems, resulting in significant losses and even endangering public safety. Therefore, how to effectively detect and respond to these cyber - attacks has become an urgent problem to be solved. ### Core Problems of the Paper The paper aims to propose an attack - detection model based on machine learning for detecting potential security threats in the smart grid and accurately distinguishing different types of power system disturbances. Specifically, the research objectives include: 1. **Utilizing historical data and logs**: By collecting data and logs from phasor measurement units (PMUs), train a machine - learning model to recognize the normal behavior of the system. 2. **Constructing and evaluating the model**: Use three different machine - learning algorithms (random forest, logistic regression, and support vector machine) and evaluate them through multiple performance metrics (such as accuracy, F1 - score, etc.). 3. **Optimizing the model performance**: Improve the performance of the model through feature selection and hyper - parameter tuning. ### Main Methods and Steps The paper adopts the following steps to achieve its objectives: 1. **Data pre - processing**: Clean the original data, remove outliers and infinite values, and use SMOTE (Synthetic Minority Over - sampling Technique) to solve the class - imbalance problem. 2. **Feature selection**: Use the mutual - information method to select the most relevant features to improve the generalization ability of the model. 3. **Model construction and evaluation**: Construct random forest, logistic regression, and K - nearest - neighbor models respectively, and evaluate them using 10 - fold cross - validation. 4. **Model optimization**: Perform hyper - parameter tuning on the best - performing random forest model to further improve its performance. ### Experimental Results The experimental results show that the random forest model performs best in detecting power system disturbances, achieving an accuracy of 90.56%. In addition, by comparing the performance of the model before and after feature selection, it is found that the model performs better when using all features, which may be because feature selection may lead to data over - fitting. ### Conclusions The research shows that the random forest model has high accuracy in detecting cyber - attacks in the power system and classifying different types of disturbances. Future research can consider combining deep - learning and big - data technologies to further improve the performance and robustness of the model. ### Formula Representation There are few formulas involved in the paper, but the following formulas are commonly used when describing model evaluation metrics: - **Accuracy**: \[ \text{Accuracy}=\frac{\text{TP}+\text{TN}}{\text{TP}+\text{TN}+\text{FP}+\text{FN}} \] - **Precision**: \[ \text{Precision}=\frac{\text{TP}}{\text{TP}+\text{FP}} \] - **Recall**: \[ \text{Recall}=\frac{\text{TP}}{\text{TP}+\text{FN}} \] - **F1 - Score**: \[ F1 = 2\cdot\frac{\text{Precision}\cdot\text{Recall}}{\text{Precision}+\text{Recall}} \] where TP, TN, FP, and FN represent true positive, true negative, false positive, and false negative respectively. Through these methods and techniques, the paper successfully demonstrates the machine...