Data Clustering: Integrating Different Distance Measures with Modified k-Means Algorithm

Vaishali R. Patel,Rupa G. Mehta
DOI: https://doi.org/10.1007/978-81-322-0491-6_63
2012-01-01
Abstract:Unsupervised learning is the process to partition the given data set into number of clusters where similar data objects belongs same cluster and dissimilar data objects belongs to another cluster. k-Means is the partition based unsupervised learning algorithm which is popular for its simplicity and ease of use. Yet, k-Means suffers from the major shortcoming of passing number of clusters and centroids in advance. Decimal scaling is one of the normalization approaches which standardize the features of the dataset and improve the effectiveness of the algorithm. Integrating different distance measures with modified k-Means algorithm help to select the proper distance measure for specific data mining application. This paper compare the results of modified k-Means with different distance measures like Euclidean Distance, Manhattan Distance, Minkowski Distance, Cosine Measure Distance and the Decimal Scaling normalization approach. Result Analysis is taken on various datasets from UCI machine dataset repository and shows that Mk-Means is advantageous and improve the effectiveness with normalized approach and Minkowski distance measure.
What problem does this paper attempt to address?