An optimized SVM-RFE based feature selection and weighted entropy K-means approach for big data clustering in mapreduce
Suman Madan,Komalavalli C,Manjot Kaur Bhatia,Chetna Laroiya,Monika Arora
DOI: https://doi.org/10.1007/s11042-023-18044-4
IF: 2.577
2024-02-16
Multimedia Tools and Applications
Abstract:In the digitalized world, efficient big data clustering is necessary for massive data generation. The clustering algorithm plays an important role in resolving the computational complexity. The big data arriving from various sources are being processed using the MapReduce framework (MRF) by the knowledge of the clustering algorithms. Moreover, the clustering algorithm is useful for mining the significant information from the dataset. Generally, there are several difficulties in applying the clustering approach to big data as its new challenges are based on computation cost and reasonable time. Hence, this research introduced the Competitive Jaya Leader Harris Hawks Optimization assisted Entropy Weighted Power K-Means Clustering (CJayaLHHO_EWPKMC) for big data clustering. In addition, the overall processing of the devised method for big data clustering is carried out in the MapReduce (MR) framework. In mapper, the feature selection is done using Support vector Machine-Recursive Feature Elimination (SVM-RFE) assisted Jaya Leader Harris Hawks Optimization (JayaLHHO). In the reducer, the big data clustering is established using the EWPKMC method, wherein the weight of EWPKMC is modified with the CJayaLHHO algorithm such that the clustering outcome is attained. The proposed method is scalable, simple, cost-effective, and able to integrate with other technologies. The experimental result portrays that the developed method attained a superior presentation than the conventional methods based on the clustering accuracy is 0.937, the Jaccard coefficient is 0.913, and the rand coefficient is 0.912.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering