Abstract:Network Intrusion Detection Systems (NIDSs) detect intrusion attacks in network traffic. In particular, machine-learning-based NIDSs have attracted attention because of their high detection rates of unknown attacks. A distributed processing framework for machine-learning-based NIDSs employing a scalable distributed stream processing system has been proposed in the literature. However, its performance, when machine-learning-based classifiers are implemented has not been comprehensively evaluated. In this study, we implement five representative classifiers (Decision Tree, Random Forest, Naive Bayes, SVM, and kNN) based on this framework and evaluate their throughput and latency. By conducting the experimental measurements, we investigate the difference in the processing performance among these classifiers and the bottlenecks in the processing performance of the framework.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of performance evaluation of machine - learning - based network intrusion detection systems (NIDS, abbreviated as MLNIDS) in distributed processing frameworks. Specifically, the paper focuses on the following points: 1. **Detection of unknown attacks**: Traditional signature - based NIDS cannot detect unknown attacks because the patterns of these attacks are not in the system's signature database. To solve this problem, machine - learning - based NIDS (MLNIDS) has been proposed in recent years, which can detect unknown attacks by learning known attack patterns. 2. **Insufficient performance evaluation of existing frameworks**: Although several distributed processing frameworks for implementing MLNIDS have been proposed [4,5], these studies have not comprehensively evaluated the performance when machine - learning classifiers are implemented. In particular, the actual throughput and latency of these frameworks when processing large - scale network traffic have not been fully verified. 3. **Challenges of system scaling**: Since existing research mainly focuses on classifier performance and less on processing speed, it is difficult to determine the scale of network traffic that MLNIDS can handle. This makes it difficult to plan the system size in practical applications. To fill these gaps, the authors constructed an MLNIDS based on the distributed processing framework proposed by Tada et al. and implemented five representative machine - learning classifiers (decision tree, random forest, naive Bayes, support vector machine, k - nearest neighbor). Through experimental measurements, they evaluated the throughput and latency of these classifiers to explore the processing performance differences between different classifiers and the performance bottlenecks in the framework. ### Main contributions of the paper - **Comprehensive performance evaluation**: For the first time, a comprehensive performance evaluation of the machine - learning - based distributed NIDS framework was carried out, including throughput and latency. - **Classifier performance comparison**: Through experiments, the performance of five common machine - learning classifiers in processing large - scale network traffic was compared, and it was found that different classifiers have significant differences in performance. - **Performance bottleneck analysis**: The performance bottlenecks in the framework were identified, mainly concentrated on the Zeek, Logstash and Elasticsearch subsystems, providing directions for subsequent optimization. ### Conclusions and future work - **Selecting appropriate classifiers**: According to the experimental results, it is recommended to select appropriate classifiers under different network traffic conditions to balance classification performance and processing speed. - **Further optimization**: It is suggested to improve the performance of Logstash and Elasticsearch by allocating high - performance nodes or parallel processing. - **Extended research**: It is planned to evaluate the performance of deep - learning classifiers in future research and test the stability and fault - tolerance ability of MLNIDS in long - term operation. Through these efforts, this research provides important guidance for constructing efficient and practical machine - learning - based network intrusion detection systems.

Practical Performance of a Distributed Processing Framework for Machine-Learning-based NIDS

NIDS: Neural Network Based Intrusion Detection System

Performance Evaluation of Apache Spark MLlib Algorithms on an Intrusion Detection Dataset

A Novel Lightweight NIDS Framework for Detecting Anomalous Data Traffic in Contemporary Networks

Work-in-Progress: Towards Real-Time IDS Via RNN and Programmable Switches Co-Designed Approach

A Framework for Detecting Distributed Denial of Services Attack in Cloud Enviorment using Machine Learning Techniques

Improving the performance of NIDS using symmetric multi-processor

Modeling and Performance Analysis of Network-Based Intrusion Detection Cluster

Performance evaluation of Machine learning algorithms for Intrusion Detection System

Real-time intrusion detection for high-speed networks

Improving the Performance of Machine Learning-Based Network Intrusion Detection Systems on the UNSW-NB15 Dataset

Parameterization and Performance Analysis of a Scalable, near Real-Time Packet Capturing Platform

Framework for Detecting Distributed Denial of Services Attack in Cloud Environment

Practicality of in-kernel/user-space packet processing empowered by lightweight neural network and decision tree

Designing a Network Intrusion Detection System Based on Machine Learning for Software Defined Networks

Multi-Stage Optimized Machine Learning Framework for Network Intrusion Detection

Application and Performance Analysis of Data Preprocessing for Intrusion Detection System.

Machine learning to improve the performance of anomaly-based network intrusion detection in big data

A Parallel Intrusion Detection System for High-Speed Networks

A Novel Framework Design of Network Intrusion Detection Based on Machine Learning Techniques

A practical intrusion detection system based on denoising autoencoder and LightGBM classifier with improved detection performance