Evaluating the Impact of Data Preprocessing Techniques on the Performance of Intrusion Detection Systems

Kelson Carvalho Santos,Rodrigo Sanches Miani,Flávio de Oliveira Silva,Miani, Rodrigo Sanches
DOI: https://doi.org/10.1007/s10922-024-09813-z
2024-03-23
Journal of Network and Systems Management
Abstract:The development of Intrusion Detection Systems using Machine Learning techniques (ML-based IDS) has emerged as an important research topic in the cybersecurity field. However, there is a noticeable absence of systematic studies to comprehend the usability of such systems in real-world applications. This paper analyzes the impact of data preprocessing techniques on the performance of ML-based IDS using two public datasets, UNSW-NB15 and CIC-IDS2017. Specifically, we evaluated the effects of data cleaning, encoding, and normalization techniques on the performance of binary and multiclass intrusion detection models. This work investigates the impact of data preprocessing techniques on the performance of ML-based IDS and how the performance of different ML-based IDS is affected by data preprocessing techniques. To this end, we implemented a machine learning pipeline to apply the data preprocessing techniques in different scenarios to answer such questions. The findings analyzed using the Friedman statistical test and Nemenyi post-hoc test revealed significant differences in groups of data preprocessing techniques and ML-based IDS, according to the evaluation metrics. However, these differences were not observed in multiclass scenarios for data preprocessing techniques. Additionally, ML-based IDS exhibited varying performances in binary and multiclass classifications. Therefore, our investigation presents insights into the efficacy of different data preprocessing techniques for building robust and accurate intrusion detection models.
computer science, information systems,telecommunications
What problem does this paper attempt to address?