A Nyström Spectral Clustering Algorithm Based on Probability Incremental Sampling

Hongjie Jia,Shifei Ding,Mingjing Du
DOI: https://doi.org/10.1007/s00500-016-2160-8
IF: 3.732
2016-01-01
Soft Computing
Abstract:Spectral clustering will map the data points of the original space into a low-dimensional eigen-space to make them linearly separable, so it is able to process the data with complex structures. However, spectral clustering needs to store the entire similarity matrix and requires eigen-decomposition. Both procedures will consume a lot of time and space resources, limiting the application of spectral clustering algorithm in large-scale data environment. To reduce the complexity of spectral clustering algorithm, we may use the Nyström extension technique to calculate the approximate eigenvectors by sampling a few of data points. This method sacrifices the clustering accuracy in exchange for the improvement of the algorithm efficiency. To select more representative sample points to reflect the distribution of data sets much better, this paper designs a dynamic incremental sampling method used for the Nyström spectral clustering, in which the data points are sampled according to different probability distributions and we theoretically prove that the increase of sampling times can effectively decrease the sampling error. The feasibility and effectiveness of the proposed algorithm are analyzed by the experiments on UCI machine learning data sets.
What problem does this paper attempt to address?