Comparative study of ML models for IIoT intrusion detection: impact of data preprocessing and balancing

Abdulrahman Mahmoud Eid,Bassel Soudan,Ali Bou Nasif,MohammadNoor Injadat
DOI: https://doi.org/10.1007/s00521-024-09439-x
2024-02-11
Neural Computing and Applications
Abstract:This study investigates the effectiveness of six prominent machine learning models—random forest, decision trees, K-nearest neighbor, logistic regression, support vector machines, and Naïve Bayes—for intrusion detection systems in industrial Internet of Things environments. The evaluation encompasses the effects of data preprocessing techniques, including feature engineering, data normalization, recoding, and missing data mitigation. Furthermore, the research delves into dataset balancing, examining the effects of six different techniques on model performance. The investigations are conducted using the domain-specific WUSTL-IIOT-2021 dataset, which captures the unique characteristics of IIoT data. The study also investigates multi-class attack identification utilizing an innovative SMOTE-based multi-class balancing approach to tackle dataset imbalances. The results indicate that data preprocessing and intelligent dataset balancing produce consistent enhancements in the classification performance of the selected models across binary and multi-classification tasks. Random forest emerges as the standout algorithm, delivering consistently high performance with computational efficiency.
computer science, artificial intelligence
What problem does this paper attempt to address?