Robust Landmark Graph-Based Clustering for High-Dimensional Data

Ben Yang,Jinghan Wu,Aoran Sun,Naying Gao,Xuetao Zhang
DOI: https://doi.org/10.1016/j.neucom.2022.05.011
IF: 6
2022-01-01
Neurocomputing
Abstract:High-dimensional data has attracted much attention because it contains more comprehensive information about samples. How to cluster these high-dimensional data has become a crucial topic in unsupervised learning. Existing clustering methods often show limited applicability due to their high computational complexity and low anti-noise ability. To address this issue, we propose a novel robust landmark graph-based clustering algorithm for high-dimensional data (RLGCH), which inherits the advantages of both k-means++ and graph-based clustering by using the results of k-means++ as pseudo labels for landmark graph-based clustering. In particular, RLGCH can achieve more reasonable clustering effectiveness than methods that just operate in the low-dimensional space or the original space since it performs k-means++ in the low-dimensional space and landmark graph-based spectral clustering in the original feature space. To avoid post-processing after optimization, the embedded factor matrix is constrained as an indicator matrix rather than a simple nonnegative matrix. To enhance the clustering robustness, the L2,1-norm is adopted to minimize the error of results between k-means++ and landmark graph-based clustering. To solve the model of RLGCH, we established a novel efficient optimization strategy to obtain all sample categories directly. Combining our clustering model and optimization strategy, the computational complexity is reduced to linear and insensitive to data dimensions. Extensive experiments on seven real-world datasets and sixteen noisy datasets show that compared with other state-of-the-art methods, RLGCH can improve the clustering efficiency and robustness greatly while guaranteeing comparable or even better clustering effectiveness.
What problem does this paper attempt to address?