Real-time Distributed-Random-Forest-Based Network Intrusion Detection System Using Apache Spark

Hao Zhang,Shumin Dai,Yongdan Li,Wenjun Zhang
DOI: https://doi.org/10.1109/pccc.2018.8711068
2018-01-01
Abstract:With the rapid increase in Internet services, network traffic data has become very large and complex, increasing the possibility of intrusions. The traditional intrusion detection system (IDS) cannot detect intrusion behaviors among such high-speed traffic data. A real-time network IDS should be able to process large amounts of network traffic data as quickly as possible to detect malicious traffic as early as possible. Therefore, in this paper, we propose a network intrusion detection framework based on a distributed random forest capable of handling high-speed traffic data. This framework consists of three parts: a data capturing part based on NetFlow, a preprocessing data part and a classification-based intrusion detection part. In this paper, we apply the random forest classification algorithm and adapt it to the Apache Spark distributed processing system to realize real-time detection. To verify the effectiveness of the framework, we implement the system and perform several comparison experiments. The results show that the system has satisfactory efficiency and accuracy compared to existing systems and thus is very suitable for the real-time detection of network intrusion, with a large capacity and high speed.
What problem does this paper attempt to address?