Practical Performance of a Distributed Processing Framework for Machine-Learning-based NIDS

Maho Kajiura,Junya Nakamura
2024-05-21
Abstract:Network Intrusion Detection Systems (NIDSs) detect intrusion attacks in network traffic. In particular, machine-learning-based NIDSs have attracted attention because of their high detection rates of unknown attacks. A distributed processing framework for machine-learning-based NIDSs employing a scalable distributed stream processing system has been proposed in the literature. However, its performance, when machine-learning-based classifiers are implemented has not been comprehensively evaluated. In this study, we implement five representative classifiers (Decision Tree, Random Forest, Naive Bayes, SVM, and kNN) based on this framework and evaluate their throughput and latency. By conducting the experimental measurements, we investigate the difference in the processing performance among these classifiers and the bottlenecks in the processing performance of the framework.
Cryptography and Security,Distributed, Parallel, and Cluster Computing,Machine Learning,Networking and Internet Architecture
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of performance evaluation of machine - learning - based network intrusion detection systems (NIDS, abbreviated as MLNIDS) in distributed processing frameworks. Specifically, the paper focuses on the following points: 1. **Detection of unknown attacks**: Traditional signature - based NIDS cannot detect unknown attacks because the patterns of these attacks are not in the system's signature database. To solve this problem, machine - learning - based NIDS (MLNIDS) has been proposed in recent years, which can detect unknown attacks by learning known attack patterns. 2. **Insufficient performance evaluation of existing frameworks**: Although several distributed processing frameworks for implementing MLNIDS have been proposed [4,5], these studies have not comprehensively evaluated the performance when machine - learning classifiers are implemented. In particular, the actual throughput and latency of these frameworks when processing large - scale network traffic have not been fully verified. 3. **Challenges of system scaling**: Since existing research mainly focuses on classifier performance and less on processing speed, it is difficult to determine the scale of network traffic that MLNIDS can handle. This makes it difficult to plan the system size in practical applications. To fill these gaps, the authors constructed an MLNIDS based on the distributed processing framework proposed by Tada et al. and implemented five representative machine - learning classifiers (decision tree, random forest, naive Bayes, support vector machine, k - nearest neighbor). Through experimental measurements, they evaluated the throughput and latency of these classifiers to explore the processing performance differences between different classifiers and the performance bottlenecks in the framework. ### Main contributions of the paper - **Comprehensive performance evaluation**: For the first time, a comprehensive performance evaluation of the machine - learning - based distributed NIDS framework was carried out, including throughput and latency. - **Classifier performance comparison**: Through experiments, the performance of five common machine - learning classifiers in processing large - scale network traffic was compared, and it was found that different classifiers have significant differences in performance. - **Performance bottleneck analysis**: The performance bottlenecks in the framework were identified, mainly concentrated on the Zeek, Logstash and Elasticsearch subsystems, providing directions for subsequent optimization. ### Conclusions and future work - **Selecting appropriate classifiers**: According to the experimental results, it is recommended to select appropriate classifiers under different network traffic conditions to balance classification performance and processing speed. - **Further optimization**: It is suggested to improve the performance of Logstash and Elasticsearch by allocating high - performance nodes or parallel processing. - **Extended research**: It is planned to evaluate the performance of deep - learning classifiers in future research and test the stability and fault - tolerance ability of MLNIDS in long - term operation. Through these efforts, this research provides important guidance for constructing efficient and practical machine - learning - based network intrusion detection systems.