An Intelligent Clustering Algorithm for High-Dimensional Multiview Data in Big Data Applications

Qian Tao,Chunqin Gu,Zhenyu Wang,Daoning Jiang
DOI: https://doi.org/10.1016/j.neucom.2018.12.093
IF: 6
2019-01-01
Neurocomputing
Abstract:There are many high-dimensional multiview data in various big data applications. It is very difficult to deal with those high-dimensional multiview data for the classic clustering algorithms, which consider all features of data with equal relevance. To tackle this challenging problem, this paper aims at proposing a novel intelligent weighting k-means clustering (IWKM) algorithm based on swarm intelligence. Firstly, the degree of coupling between clusters is presented in the model of clustering to enlarge the dissimilarity of clusters. Various weights of views and features are used in the weighting distance function to determine the clusters of objects. Secondly, to eliminate the sensitivity of initial cluster centers, swarm intelligence is utilized to find initial cluster centers, weights of views, and weights of features by a global search. Lastly, a precise perturbation is proposed to improve optimization performance of swarm intelligence. To verify the performance of clustering for high-dimensional multiview data, the experiments were performed by the evaluation metrics of Rand Index, Jaccard Coefficient and Folkes Russe in five big data applications on the two different computational platforms of apache spark and single node. The experimental results show that IWKM is effective and efficient in clustering of high-dimensional multiview data, and can obtain better performance than the other 5 kinds of approaches in these complicated data sets with more views and higher dimensions on apache spark and single node.
What problem does this paper attempt to address?