Abstract:Applications in many domains such as text mining and natural language processing need to deal with high-dimensional data. High-dimensional data may present better clustering characteristics on a selected low-dimensional subspace. Subspace clustering is to project the data onto a low-dimensional subspace before clustering. Traditional subspace clustering methods employ eigenvalue decomposition to find the projection of the input data and perform K-means or kernel K-means to obtain the clustering matrix. This kind of methods is not only inefficient, but also adopts a two-step method to generate an approximate solution. Although Discriminative K-means (DisKmeans) integrates dimensionality reduction and clustering into a joint framework and solves the optimization problem by kernel K-means, such method needs to find the centroids in the kernel space and class labels iteratively and has a square time complexity. Accordingly, in this paper, we propose an algorithm, namely Fast DisKmeans (FDKM), to obtain the cluster indicator matrix in a direct way. Moreover, our proposed method has a linear time complexity, which is a significant reduction compared with the squared time complexity of DisKmeans. We also demonstrate that solving the object function of DisKmeans is equivalent to representing the cluster assignment matrix by a low-dimensional linear mapping of the data. Based on this observation, we propose the second algorithm, namely Iterative Fast DisKmeans (IFDKM), which also has a linear time complexity. A series of experiments were conducted on several datasets, and the experimental results showed the superior performance of FDKM and IFDKM.

Subspace Clustering Algorithm Based on k Most Similar Clustering

Subspace Clustering by Directly Solving Discriminative K-means

Subspace Clustering for Vector Clusters

A New Subspace Clustering Algorithm

Co-Referenced Subspace Clustering

A Subspace Clustering Algorithm for High Dimensional Data Based on Similar Dimension

A Subspace Clustering Algorithm for High Dimensional Spatial Data

A Two-Step Non-redundant Subspace Clustering Approach.

Efficient Direct Structured Subspace Clustering

Subspace Clustering with $K$ -Support Norm

Subspace Clustering Via Good Neighbors

A Fuzzy K-modes-based Algorithm for Soft Subspace Clustering

Local Subspace Clustering

An Improved Subspace Clustering Algorithm Based on Sparse Representation

Efficient Approaches for Summarizing Subspace Clusters into K Representatives

DSKmeans: A new kmeans-type approach to discriminative subspace clustering

Information Theoretic Subspace Clustering.

Subspace Clustering Through Attribute Clustering

Fast Subspace Clustering Based on the Kronecker Product

Overlapping Subspace Clustering Based on Local Weighted Least Squares Regression

An Entropy Weighting K-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data