Inverse-degree Sampling for Spectral Clustering

Haidong Gao,Yueting Zhuang,Fei Wu,Jian Shao
DOI: https://doi.org/10.1109/icig.2011.54
2011-01-01
Abstract:Among those classical clustering algorithms, spectral clustering performs much better than K-means in most cases. However, for the sake of cubic time complexity, spectral clustering is hardly used for clustering large-scale data sets. Therefore, sampling-based methods such as Nystrom method and Column sampling are respectively conducted as potential approaches to tackle this challenge. As we know, current sampling-based methods often utilize the uniform or other random sampling policies to select representative data and tend to disregard the data in small size clusters. This paper proposes an unbiased sampling framework, derives a new sampling method called inverse-degree sampling and then introduces an entropy criterion to prove it in theory simply. According to the selection of representative data by inverse-degree sampling in spectral clustering, the time complexity of spectral clustering becomes quadratic. Experiments on both toy data and real-world data demonstrate both the good sampling performance and the comparable clustering quality.
What problem does this paper attempt to address?