Integrating Curriculum Learning With k-Means: A Data-Centric Approach to Faster Clustering

Abdul Majeed,Seong Oun Hwang
DOI: https://doi.org/10.1109/mitp.2024.3405857
2024-11-27
IT Professional
Abstract:k-means clustering is a very popular method that groups n observations into k clusters based on the nearest mean, whereas each observation serves as the prototype of a cluster. Although k-means yields desirable results in most cases, the computing overhead is very high even for modest-size datasets, which makes it unsuitable for larger datasets. To lower computing overhead without degrading performance, in this article, we propose and implement a curriculum learning (CL)-integrated k-means clustering method for efficiently clustering observations. In our method, the data to be clustered via k-means are evaluated beforehand and sorted based on complexity using the CL approach. By applying CL, the data portion that is naturally clustered is identified and bypassed by k-means so that only some complex portions of the data undergo processing, leading to a significant reduction in computing overhead.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?