Abstract:High-dimensional data has attracted much attention because it contains more comprehensive information about samples. How to cluster these high-dimensional data has become a crucial topic in unsupervised learning. Existing clustering methods often show limited applicability due to their high computational complexity and low anti-noise ability. To address this issue, we propose a novel robust landmark graph-based clustering algorithm for high-dimensional data (RLGCH), which inherits the advantages of both k-means++ and graph-based clustering by using the results of k-means++ as pseudo labels for landmark graph-based clustering. In particular, RLGCH can achieve more reasonable clustering effectiveness than methods that just operate in the low-dimensional space or the original space since it performs k-means++ in the low-dimensional space and landmark graph-based spectral clustering in the original feature space. To avoid post-processing after optimization, the embedded factor matrix is constrained as an indicator matrix rather than a simple nonnegative matrix. To enhance the clustering robustness, the L2,1-norm is adopted to minimize the error of results between k-means++ and landmark graph-based clustering. To solve the model of RLGCH, we established a novel efficient optimization strategy to obtain all sample categories directly. Combining our clustering model and optimization strategy, the computational complexity is reduced to linear and insensitive to data dimensions. Extensive experiments on seven real-world datasets and sixteen noisy datasets show that compared with other state-of-the-art methods, RLGCH can improve the clustering efficiency and robustness greatly while guaranteeing comparable or even better clustering effectiveness.

Fast and robust K-means clustering via feature learning on high-dimensional data

Subspace Clustering by Directly Solving Discriminative K-means

Towards Robust Subspace Clustering Via Joint Feature Extraction and Cauchy Loss Function

Adaptive Feature Weight Learning for Robust Clustering Problem with Sparse Constraint

A Sparse Framework for Robust Possibilistic K-Subspace Clustering

A feature group weighting method for subspace clustering of high-dimensional data

Robust and Sparse Fuzzy K-Means Clustering

KNCFS: Feature selection for high-dimensional datasets based on improved random multi-subspace learning

Adaptively Robust and Sparse K-means Clustering

IFKMHC: Implicit Fuzzy K-Means Model for High-Dimensional Data Clustering

Feature Selection for Clustering on High Dimensional Data

Fuzzy K-Means Clustering With Discriminative Embedding

Robust Discriminant Subspace Clustering with Adaptive Local Structure Embedding.

Joint Learning of Fuzzy k -Means and Nonnegative Spectral Clustering With Side Information

Discriminatively embedded fuzzy K-Means clustering with feature selection strategy

DSKmeans: A new kmeans-type approach to discriminative subspace clustering

A robust and sparse K-means clustering algorithm

Robust Landmark Graph-Based Clustering for High-Dimensional Data

Maximum Correntropy Criterion-Based Sparse Subspace Learning for Unsupervised Feature Selection

Improving Projected Fuzzy K-means Clustering Via Robust Learning

Joint feature selection and optimal bipartite graph learning for subspace clustering