An innovative clustering approach utilizing frequent item sets

DOI: https://doi.org/10.1007/s11042-024-18913-6
IF: 2.577
2024-04-27
Multimedia Tools and Applications
Abstract:Clustering is a method in data mining that belongs to the category of unsupervised learning. Cluster analysis categorizes data into different classes by identifying the internal organization of objects in the data set and their relationships. Many clustering methods are designed with specific assumptions about the underlying data distribution or cluster shapes. If these assumptions do not hold in a particular dataset, the performance of the clustering algorithm may suffer. This paper introduces a novel frequency clustering method called CFI (Clustering based on Frequent Itemsets). This innovative approach opens up a new avenue for research in frequency clustering, departing from conventional distance-based methods. CFI has the potential to reveal compelling patterns or associations among features in the data. The CFI algorithm includes three main steps. Firstly, we generate frequent item sets. Secondly, we built the centroids of each cluster based on a new measure called FI-distance, which combines the Euclidian distance with a similarity measure for item sets. Third, each object is assigned to the appropriate cluster based on its membership degree. Various experiments were conducted on synthetic and real-world datasets, utilizing three performance criteria: the Davies Bouldin score, the silhouette width criterion, and the Calinski-Harabasz score. The CFI method was initially compared to state-of-the-art methods and subsequently to automatic clustering methods. The results indicate the superiority of the CFI algorithm compared to the other methods.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?