KMDT: A Hybrid Cluster Approach for Anomaly Detection Using Big Data

Santosh Thakur,Ramesh Dharavath
DOI: https://doi.org/10.1007/978-981-10-7563-6_18
2018-01-01
Abstract:In the current digital era, huge data are being generated in a voluminous state from different sources. This lead towards a processing repository called Big Data. Managing and processing such data in parallel clusters is a big challenge. To capture this problem, in this paper, we propose a hybrid algorithm for cluster analysis using the Spark framework for analyzing the Big Data instances. The proposed algorithm is the combination of two machine learning techniques namely, K-Means (KM) and C5.0 Decision Tree (DT). As per the factor of cluster, euclidean distance is used to find the nearest cluster and the related DT is built for each cluster using C5.0 DT algorithm. The inferences of the DT are used to classify each anomaly and the normal instances of the large datasets. Experimental results show that the proposed hybrid algorithm outperforms with other existing algorithms and produces better classification accuracy for anomaly detection.
What problem does this paper attempt to address?