Effects of similarity/distance metrics on k-means algorithm with respect to its applications in IoT and multimedia: a review
Manoj Kumar Gupta,Pravin Chandra
DOI: https://doi.org/10.1007/s11042-021-11255-7
IF: 2.577
2021-09-06
Multimedia Tools and Applications
Abstract:Recently, Internet of Things (IoT) and multimedia are gaining popularity because of their usages in various applications. Numerous sensors and automated devices are generating huge volumes of data. Therefore, it is required to efficiently and effectively analyze this voluminous data. It can be achieved by using appropriate machine leaning techniques such as clustering. Among the clustering techniques, the k-means method/algorithm is one of the simplest, effective and commonly used methods. For making the cluster, it uses a measure of similarity/distance among the data observations. Nearby/similar data observations are placed within the same cluster whereas distant/dis-similar data observations are placed in other clusters. Hence, the similarity/distance metric plays a major role on the performance and accuracy of the k-means. Therefore, using an appropriate similarity/distance metric, the performance and accuracy of the k-means can be improved. K-means algorithm is majorly implemented using Euclidean distance metric. With the objective to explore the better and/or alternate similarity/distance metric(s) for k-means, a case study, based on empirical evaluation, of thirteen different similarity/distance metrics on six well-known datasets is performed and presented in this paper. By using the efficient and effective similarity / distance metrics, the performance and accuracy of the k-means algorithm can be improved which leads to formation of good clusters of various data observations or things or images etc. The results of the empirical study are analyzed and compared on the basis of widely used statistical clustering evaluation/validation measures. Based on the comparative results, these metrics are assigned with the ranks. Overall, the results demonstrate that Manhattan and Minkowski distance metrics gives better results for k-means algorithm.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering