Abstract:There are many high dimensional multi-view data for various complex and large-scaled applications in a big data environment. However, traditional clustering algorithms consider all features of data with equal relevance, which is difficult to deal with those high dimensional multi-view data. In order to address this challenge problem, we propose a novel approach named intelligent weighting k-means clustering approach (IWKM), which is based on swarm intelligence and k-means algorithm. Because of the sensitivity to initial clusters centers of k-means, IWKM algorithm utilizes the global search capability of swarm intelligence to find initial clusters centers, the weights of view and feature. Then the weighting k-means approach is applied to determine the clusters of objects with initial clusters centers, the weights of view and feature obtained by swarm intelligence. The character of IWKM is as follows: In the model of clustering, every view and feature have their own weights. The weights will affect object's assigned cluster. The weights of view and feature are calculated by swarm intelligent algorithm; At the same time, the degree of coupling between clusters is also introduced into the model of clustering to enlarge the dissimilarity of clusters. The comprehensive experiments are conducted on three high dimensional multi-view data from machine learning repository. The experimental results are put together with five other state-ofthe-art clustering algorithms by the evaluation metrics of Rand index, Jaccard coefficient and Folkes Russel. The experiments reveal that our new approach can generate better clustering results when dealing with high dimensional multi-view data in a big data environment.

Clustering Technology for High Dimensional Data Based on Semantics

Document Clustering Using Locality Preserving Indexing

Semi-supervised Hierarchical Clustering Analysis for High Dimensional Data

Semantic document clustering based on ontology

High-Efficiency Text Clustering Algorithm Based on Semantic Distance

Towards Semantically Sensitive Text Clustering: a Feature Space Modeling Technology Based on Dimension Extension.

A Statistics-Based Semantic Relation Analysis Approach For Document Clustering

Hybrid Clustering of Data and Vague Concepts Based on Labels Semantics

An Approach of Latent Semantic Space Partition and Web Document Clustering

High-Order Co-clustering Text Data on Semantics-Based Representation Model.

Document Clustering Based on Semantic Smoothing Approach

Enhanced Locality Sensitive Clustering in High Dimensional Space

Study on Massive Short Documents Clustering Technology

Research on High Dimensional Clustering Algorithm Based on Similarity Measure

Clustering-based Semantic Retrieval Algorithm

A Semantic Approach for Text Clustering Using WordNet and Lexical Chains

A Novel Text Clustering Algorithm Based on Inner Product Space Model of Semantic

Data Clustering Method with Feature Semantic Weight

State of the art document clustering algorithms based on semantic similarity

A Novel Intelligent Clustering Approach For High Dimensional Data In A Big Data Environment

Short documents clustering in very large text databases