Abstract:K-means clustering algorithm is a partitional clustering algorithm that has been used widely in many applications for traditional clustering due to its simplicity and low computational complexity. This clustering technique depends on the user specification of the number of clusters generated from the dataset, which affects the clustering results. Moreover, random initialization of cluster centers results in its local minimal convergence. Automatic clustering is a recent approach to clustering where the specification of cluster number is not required. In automatic clustering, natural clusters existing in datasets are identified without any background information of the data objects. Nature-inspired metaheuristic optimization algorithms have been deployed in recent times to overcome the challenges of the traditional clustering algorithm in handling automatic data clustering. Some nature-inspired metaheuristics algorithms have been hybridized with the traditional K-means algorithm to boost its performance and capability to handle automatic data clustering problems. This study aims to identify, retrieve, summarize, and analyze recently proposed studies related to the improvements of the K-means clustering algorithm with nature-inspired optimization techniques. A quest approach for article selection was adopted, which led to the identification and selection of 147 related studies from different reputable academic avenues and databases. More so, the analysis revealed that although the K-means algorithm has been well researched in the literature, its superiority over several well-established state-of-the-art clustering algorithms in terms of speed, accessibility, simplicity of use, and applicability to solve clustering problems with unlabeled and nonlinearly separable datasets has been clearly observed in the study. The current study also evaluated and discussed some of the well-known weaknesses of the K-means clustering algorithm, for which the existing improvement methods were conceptualized. It is noteworthy to mention that the current systematic review and analysis of existing literature on K-means enhancement approaches presents possible perspectives in the clustering analysis research domain and serves as a comprehensive source of information regarding the K-means algorithm and its variants for the research community.

Automatic Recommendation of a Distance Measure for Clustering Algorithms

Human Factors Based Partitioning Versus Data Clustering for Recommendations.

A Hybrid Recommendation Algorithm Based on Clustering and Collaborative Filtering

A Statistical Information-Based Clustering Approach in Distance Space

Rethinking Recommender Systems: Cluster-based Algorithm Selection

Enhancing Time Series Clustering by Incorporating Multiple Distance Measures with Semi-Supervised Learning

Data Clustering: Integrating Different Distance Measures with Modified k-Means Algorithm

A Novel Effective Distance Measure and a Relevant Algorithm for Optimizing the Initial Cluster Centroids of K-means

An Investigation into Distance Measures in Cluster Analysis

A new distance measurement and its application in K-Means Algorithm

Learning Bregman Distance Functions for Semi-Supervised Clustering

A Distance Scaling Method to Improve Density-Based Clustering.

Incorporating Community Detection and Clustering Techniques into Collaborative Filtering Model

E-commerce User Recommendation Algorithm Based on Social Relationship Characteristics and Improved K-Means Algorithm

A Feature Subset Selection Algorithm Automatic Recommendation Method

The Exploitation of Distance Distributions for Clustering

A Comparison Study on Similarity and Dissimilarity Measures in Clustering Continuous Data

Efficient Clustering with Limited Distance Information

Quantifying Distances Between Clusters with Elliptical or Non-Elliptical Shapes

Automatic Parameter Selection for Non-Redundant Clustering

K-Means-Based Nature-Inspired Metaheuristic Algorithms for Automatic Data Clustering Problems: Recent Advances and Future Directions