Performance Evaluation and Comparative Analysis of Machine Learning Models on the UNSW-NB15 Dataset: A Contemporary Approach to Cyber Threat Detection
Afrah Fathima,Amir Khan,Md Faizan Uddin,Mohammad Maqbool Waris,Sultan Ahmad,Cesar Sanin,Edward Szczerbicki
DOI: https://doi.org/10.1080/01969722.2023.2296246
IF: 1.859
2023-12-27
Cybernetics & Systems
Abstract:This research work utilizes the University of New South Wales Network Based 2015 (UNSW-NB15) dataset to investigate the dynamic nature of cyber threats, departing from the obsolete Knowledge Discovery and Data Mining competition 1999 (KDD Cup99) dataset. The data preparation pipeline consists of essential procedures aimed at ensuring the integrity and appropriateness of the data for analysis. The method begins by removing null values, thereafter, applying one-hot encoding to categorical features, min-max scaling for data normalization, and label encoding for efficient management of binary labels. The process of feature selection is conducted utilizing the Pearson coefficient correlation. An exhaustive evaluation is conducted on six machine learning models for the purpose of binary classification. The evaluation takes into account key performance measures like accuracy, precision, recall, and F1 score. The Random Forest model demonstrated exceptional performance, with a remarkable accuracy of 99% and a robust F1 score of 98%. Additionally, it exhibited a well-balanced precision and recall at 98%. The Support Vector Machine, Gradient Boosting, Logistic Regression, Decision Tree, and K-Nearest Neighbors models exhibit notable performance, achieving accuracy and F1 scores around at the 98% level. During our investigation into multi-class classification research, we thoroughly examined numerous machine learning models, all of which exhibited robust performance, with accuracy rates ranging from 97% to 98%. The aforementioned results highlight the efficacy of these models in accurately classifying data, regularly achieving high levels of precision, recall, and F1 scores for positive case predictions. This study offers a current viewpoint on the identification of cyber threats and emphasizes the appropriateness of several machine learning models in this rapidly changing field.
computer science, cybernetics