Abstract:Applications in many domains such as text mining and natural language processing need to deal with high-dimensional data. High-dimensional data may present better clustering characteristics on a selected low-dimensional subspace. Subspace clustering is to project the data onto a low-dimensional subspace before clustering. Traditional subspace clustering methods employ eigenvalue decomposition to find the projection of the input data and perform K-means or kernel K-means to obtain the clustering matrix. This kind of methods is not only inefficient, but also adopts a two-step method to generate an approximate solution. Although Discriminative K-means (DisKmeans) integrates dimensionality reduction and clustering into a joint framework and solves the optimization problem by kernel K-means, such method needs to find the centroids in the kernel space and class labels iteratively and has a square time complexity. Accordingly, in this paper, we propose an algorithm, namely Fast DisKmeans (FDKM), to obtain the cluster indicator matrix in a direct way. Moreover, our proposed method has a linear time complexity, which is a significant reduction compared with the squared time complexity of DisKmeans. We also demonstrate that solving the object function of DisKmeans is equivalent to representing the cluster assignment matrix by a low-dimensional linear mapping of the data. Based on this observation, we propose the second algorithm, namely Iterative Fast DisKmeans (IFDKM), which also has a linear time complexity. A series of experiments were conducted on several datasets, and the experimental results showed the superior performance of FDKM and IFDKM.

Mining Representative Subspace Clusters in High-dimensional Data.

Efficient Approaches for Summarizing Subspace Clusters into K Representatives

Mining Maximal Pattern-Based Subspace Clusters in High Dimensional Space

Towards a Compact and Effective Representation for Datasets with Inhomogeneous Clusters.

Discovering the Skyline of Subspace Clusters in High-Dimensional Data

A Unified Framework for Representation-Based Subspace Clustering of Out-of-Sample and Large-Scale Data.

Subspace Clustering by Directly Solving Discriminative K-means

Progressive Subspace Skyline Clusters Mining On High Dimensional Data

Effective algorithm for maximal pattern-based subspace clustering

Revealing True Subspace Clusters in High Dimensions

Robust Subspace Clustering Via Thresholding Ridge Regression

Enhanced Locality Sensitive Clustering in High Dimensional Space

Semi-supervised Hierarchical Clustering Analysis for High Dimensional Data

Towards effective and efficient mining of arbitrary shaped clusters

Dimension Reconstruction for Visual Exploration of Subspace Clusters in High-Dimensional Data.

An Effective Maximal Subspace Clustering Algorithm Based on Enumeration Tree

Subspace maximum margin clustering.

Soft Subspace Fuzzy Clustering with Dimension Affinity Constraint

Provable Data Clustering via Innovation Search

Sparse-Dense Subspace Clustering.

Mining maximal correlated member clusters in high dimensional database