Implementation and visualization of a netflow log data lake system for cyberattack detection using distributed deep learning

Jiang, Cheng-Tian,Kristiani, Endah
DOI: https://doi.org/10.1007/s11227-022-04802-y
2022-10-07
Abstract:Big data and artificial intelligence (AI) technology are complicated systems that will continue developing in recent years. This paper implemented a data lake architecture to handle massive data and perform data analysis in a real-time system. Using a data lake and AI model, a NetFlow storage monitoring system was deployed to perform a platform that can cover the storage, query, analysis, and visualization of massive volumes of data. The big data platform was built on Cloudera, which utilized big data tools like Kafka, Spark, HBase, Hive, and Impala. In addition, we used Spark to develop network threat recognition models using distributed deep learning. Also, we used the deep neural network (DNN) to train the model. Then, we evaluated the model performance, which reached 94% accuracy while decreasing by 48% of training time. The results of the studies demonstrate that deep learning model training time is significantly shortened. Additionally, this system employs several configurations to assess the elements influencing accuracy and performance. The model is evaluated using the confusion matrix to demonstrate that it can accurately detect attack behavior in log data. Furthermore, we have developed a real-time log data monitoring and analysis system to demonstrate the proposed architecture.
computer science, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?