Performance Evaluation of Threshold-Based and k-means Clustering Algorithms Using Iris Dataset

Mamta Mittal,Rajendra Kumar Sharma,Varinder Pal Singh
DOI: https://doi.org/10.2174/1872212112666180510153006
2019-05-27
Recent Patents on Engineering
Abstract:Background: Clustering is one of the data mining tools which classify the raw data reasonably into disjoint clusters. Researchers have developed many algorithms to cluster large data sets based on specific parameters. Objective: This study is centered around the popular partitioning-based technique, i.e., k-means. It requires the number of clusters to be generated as an input parameter; it does not provide a global solution of the problem; and it is sensitive to outliers and initial seed selection. Methods: In this paper, authors have discussed threshold-based clustering method, single pass method, which overcomes the above limitations but it requires a threshold value as an input parameter. Other researchers’ work related to k-means published in patent form is noteworthy and paving path for the researchers. Results: To assess the quality of clustering, numerous validity measures and indices have been assessed on the Iris dataset for both k-means and threshold-based clustering algorithms. It has been observed from the experiments that threshold-based method generates more separated and compact clusters, in addition, there is significant improvement in the validity indices. Conclusion: Threshold-based clustering generates the clusters automatically which are not sensitive to initial seeds selection and outlier; it is more scalable. It will inevitably be an efficient approach of partitioning based clustering whenever one will select the threshold value carefully or will propose new functions for deciding the value of threshold.
What problem does this paper attempt to address?