Abstract:Computer networks rely on Intrusion Detection Systems (IDSs) and Intrusion Prevention Systems (IPSs) to ensure the security, reliability, and availability of an organization. In recent years, various approaches were developed and implemented to create effective IDSs and IPSs. This paper specifically focuses on IDSs that utilize Machine Learning (ML) techniques for improved accuracy. ML-based IDSs have verified to be successful in discovering network attacks. However, their performance tends to decline when dealing with high-dimensional data spaces. It is essential to develop a suitable feature extraction strategy that could identify and remove irrelevant features that do not significantly classification process to address this issue. Additionally, many ML-based IDSs exhibit high false positive rates and poor detection accuracy when trained on unbalanced datasets. In this study, we analyze the UNSW-NB15 IDS, which will serve as the training and testing data for our models. In order to reduce the feature space and improve the efficiency of our analysis, we leverage a filter-based feature reduction method utilizing the Pearson correlation coefficient algorithm. By identifying and selecting only the most relevant features, we are able to streamline our dataset and focus on the variables that have the highest impact on our analysis. This approach not only reduces computational complexity but also improves the interpretability of our results by eliminating unnecessary noise from the data. After applying the feature reduction technique, we proceed to implement a range of machine learning methods to perform our classification task. These include well-known algorithms such as Stacking, Extra Trees, Multi-Layer Perceptron, XGBoost, K-Nearest Neighbors, Logistic Regression, Naïve Bayes, Support Vector Machine, Random Forest, and Decision Tree. By employing a diverse set of algorithms, we are able to explore different modeling approaches and evaluate their effectiveness in accurately classifying the various types of assaults. In order to assess the performance of our classification models, we utilize a range of specialized evaluation metrics such as Root Mean Square Error (RMSE), Mean Absolute Error (MAE), R2-Score, Mean Squared Error (MSE), Precision, F1-Score, Recall, and Accuracy. These metrics provide us with a comprehensive understanding of how well our models are performing across different dimensions, including the accuracy of predictions, the level of precision in classifying different assault types, and the overall goodness-of-fit of our models. By considering multiple evaluation metrics, we are able to gain a more nuanced understanding of the strengths and weaknesses of each algorithm and make informed decisions about their suitability for our classification task. These metrics deliver a complete evaluation of the classifiers’ effectiveness in detecting community intrusions.

Fuzzy K-Means with M-KMP: a security framework in pyspark environment for intrusion detection

A Mixed Intrusion Detection System Utilizing K-means and Extreme Gradient Boosting

Fuzzy Local Information and Bhattacharya-Based C-Means Clustering and Optimized Deep Learning in Spark Framework for Intrusion Detection

A Novel Intrusion Detection System Based on an Optimal Hybrid Kernel Extreme Learning Machine

Securing cloud-enabled smart cities by detecting intrusion using spark-based stacking ensemble of machine learning algorithms

An accurate IoT Intrusion Detection Framework using Apache Spark

Intrusion Detection System Using K-Means and Edited Nearest Neighbour Algorithm

K-means and meta-heuristic algorithms for intrusion detection systems

A Machine Learning-Based Framework with Enhanced Feature Selection and Resampling for Improved Intrusion Detection

Performance Evaluation of Apache Spark MLlib Algorithms on an Intrusion Detection Dataset

An Intrusion Detection Framework Based on Hybrid Multi-Level Data Mining

An empirical evaluation for the intrusion detection features based on machine learning and feature selection methods

Extending Isolation Forest for Anomaly Detection in Big Data via K-Means

Performance evaluation of Machine learning algorithms for Intrusion Detection System

Multi-Class Network Anomaly Detection Using Machine Learning Techniques

M-MultiSVM: An efficient feature selection assisted network intrusion detection system using machine learning

Intrusion detection model using machine learning algorithm on Big Data environment

Execution Improvement of Intrusion Detection System Through Dimensionality Reduction for UNSW-NB15 Information

Unified Intrusion Detection Framework: Predictive Analysis of Intrusions in Sensor Networks

A Hybrid Intrusion Detection System Based on Scalable K-Means+ Random Forest and Deep Learning

Hybrid Intrusion Detection Method Based on CM-K-Means