Applying Hadoop for log analysis toward distributed IDS

Jakrarin Therdphapiyanak,Krerk Piromsopa
DOI: https://doi.org/10.1145/2448556.2448559
2013-01-01
Abstract:In this paper, we apply Hadoop for large-scale log analysis. Our main objective is to efficiently detect an abnormal traffic from high volume data. Due to the high volume of data traffics, the size of traffic logs is usually exceed the capacity of a standalone IDS. Thus, it is practically impossible to perform useful analysis with these data. In most cases, an analysis is usually done when an attack occurred for digital forensics. We proposed applying K-Means algorithm to cluster high volume log data. The resulted clusters are useful in classifying minority as possible intruders. In addition, we proposed IP address summarization method to capture the characteristic of each cluster. Our implementation allows high volume data traffics to be analyzed with a distributed analysis system using K-Means algorithm and data mining. The eventual result is to reduce a chance of being attacked. The prominent points of our implementation are anomaly detection with large file sizes and the distributed processing. However, this paper is just a preliminary study. There exist several opportunities for optimization. Nonetheless, our implementation can point out anomaly. The K-Means Algorithm can provide a new knowledge useful for enhancing security of the system.
What problem does this paper attempt to address?