Abstract:With the growing use of information technology in all life domains, hacking has become more negatively effective than ever before. Also with developing technologies, attacks numbers are growing exponentially every few months and become more sophisticated so that traditional IDS becomes inefficient detecting them. This paper proposes a solution to detect not only new threats with higher detection rate and lower false positive than already used IDS, but also it could detect collective and contextual security attacks. We achieve those results by using Networking Chatbot, a deep recurrent neural network: Long Short Term Memory (LSTM) on top of Apache Spark Framework that has an input of flow traffic and traffic aggregation and the output is a language of two words, normal or abnormal. We propose merging the concepts of language processing, contextual analysis, distributed deep learning, big data, anomaly detection of flow analysis. We propose a model that describes the network abstract normal behavior from a sequence of millions of packets within their context and analyzes them in near real-time to detect point, collective and contextual anomalies. Experiments are done on MAWI dataset, and it shows better detection rate not only than signature IDS, but also better than traditional anomaly IDS. The experiment shows lower false positive, higher detection rate and better point anomalies detection. As for prove of contextual and collective anomalies detection, we discuss our claim and the reason behind our hypothesis. But the experiment is done on random small subsets of the dataset because of hardware limitations, so we share experiment and our future vision thoughts as we wish that full prove will be done in future by other interested researchers who have better hardware infrastructure than ours.

Applying Hadoop for log analysis toward distributed IDS

A Distributed Data Mining System Framework for Mobile Internet Access Log Based on Hadoop.

Auditing of hadoop log file for dynamic detection of threats using H-ISSM-MIM and convolutional neural network

Big data analysis and distributed deep learning for next-generation intrusion detection system optimization

An Integrated Method for Anomaly Detection From Massive System Logs.

A parallel clustering algorithm for logs data based on Hadoop platform

Distributed Log Analysis on the Cloud Using MapReduce

Advanced Network Security Analysis (ANSA) in Big Data Technology

Big-data Analysis of Multi-Source Logs for Anomaly Detection on Network-Based System.

Applying High-Performance Bioinformatics Tools for Outlier Detection in Log Data

Design and Implementation of Log Data Analysis Management System Based on Hadoop

Network Traffic Analysis:Hadoop Pig vs Typical MapReduce

Log Analysis For Network Attack Detection Using Deep Learning Models

Extending Isolation Forest for Anomaly Detection in Big Data via K-Means

Big data analysis in e-commerce system using HadoopMapReduce

Online System Problem Detection by Mining Patterns of Console Logs

Optimizing Anomaly Detection in Large-scale Logs

Multi-datasource machine learning in intrusion detection: Packet flows, system logs and host statistics

Network Intell: Enabling the Non-Expert Analysis of Large Volumes of Intercepted Network Traffic

Development of Anomaly Detection System Based on Distributed Log Tracing

Desgin and Implemention of Distributed Intrusion Detection System Based on Hadoop Cluster