Abstract:Network Intrusion Detection Systems (NIDSs) are an increasingly important tool for the prevention and mitigation of cyber attacks. Over the past years, a lot of research efforts have aimed at leveraging the increasingly powerful models of Machine Learning (ML) for this purpose. A number of labelled synthetic datasets have been generated and made publicly available by researchers, and they have become the benchmarks via which new ML-based NIDS classifiers are being evaluated. Recently published results show excellent classification performance with these datasets, increasingly approaching 100 percent performance across key evaluation metrics such as Accuracy, F1 score, AUC, etc. Unfortunately, we have not yet seen these excellent academic research results translated into practical NIDS systems with such near-perfect performance. This motivated our research presented in this paper, where we analyse the statistical properties of the benign traffic in three of the more recent and relevant NIDS datasets, (CIC_IDS, UNSW_NB15, TON_IOT), by converting them into a common flow format. As a comparison, we consider two datasets obtained from real-world production networks, one from a university network and one from a medium size Internet Service Provider (ISP). Our results show that the two real-world datasets are quite similar among themselves in regards to most of the considered statistical features. Equally, the three synthetic datasets are also relatively similar within their group. However, and most importantly, our results show a distinct difference of most of the considered statistical features between the three synthetic datasets and the two real-world datasets. Since ML relies on the basic assumption of training and test datasets being sampled from the same distribution, this raises the question of how well the performance results of ML-classifiers trained on the considered synthetic datasets can translate and generalise to real-world networks. We believe this is an interesting and relevant question which provides motivation for further research in this space.

Improving the Reliability of Network Intrusion Detection Systems Through Dataset Integration

Learn-IDS: Bridging Gaps between Datasets and Learning-Based Network Intrusion Detection

An innovative network intrusion detection system (NIDS): Hierarchical deep learning model based on Unsw-Nb15 dataset

Improving the Performance of Machine Learning-Based Network Intrusion Detection Systems on the UNSW-NB15 Dataset

Machine Learning in Network Intrusion Detection: A Cross-Dataset Generalization Study

A survey of intrusion detection from the perspective of intrusion datasets and machine learning techniques

Methodology for the Detection of Contaminated Training Datasets for Machine Learning-Based Network Intrusion-Detection Systems

On the Cross-Dataset Generalization of Machine Learning for Network Intrusion Detection

A Framework for implementing an ML or DL model to improve Intrusion Detection Systems (IDS) in the NTMA context, with an example on the dataset (CSE-CIC-IDS2018)

i-2NIDS Novel Intelligent Intrusion Detection Approach for a Strong Network Security

Enhancing Trustworthiness in ML-Based Network Intrusion Detection with Uncertainty Quantification

Addressing the class imbalance problem in network intrusion detection systems using data resampling and deep learning

RNNIDS: Enhancing Network Intrusion Detection Systems through Deep Learning

Evaluating the Impact of Different Feature as a Counter Data Aggregation approaches on the Performance of NIDSs and Their Selected Features

Machine learning-based network intrusion detection for big and imbalanced data using oversampling, stacking feature embedding and feature extraction

Multi-Stage Optimized Machine Learning Framework for Network Intrusion Detection

Benchmarking the benchmark — Comparing synthetic and real-world Network IDS datasets

A Generalized and Robust Nonlinear Approach based on Machine Learning for Intrusion Detection

Enhanced Convolution Neural Network with Optimized Pooling and Hyperparameter Tuning for Network Intrusion Detection

Comparison of Machine Learning and Deep Learning Models for Network Intrusion Detection Systems

A Machine Learning-Based Framework with Enhanced Feature Selection and Resampling for Improved Intrusion Detection