Abstract:Applications in many domains such as text mining and natural language processing need to deal with high-dimensional data. High-dimensional data may present better clustering characteristics on a selected low-dimensional subspace. Subspace clustering is to project the data onto a low-dimensional subspace before clustering. Traditional subspace clustering methods employ eigenvalue decomposition to find the projection of the input data and perform K-means or kernel K-means to obtain the clustering matrix. This kind of methods is not only inefficient, but also adopts a two-step method to generate an approximate solution. Although Discriminative K-means (DisKmeans) integrates dimensionality reduction and clustering into a joint framework and solves the optimization problem by kernel K-means, such method needs to find the centroids in the kernel space and class labels iteratively and has a square time complexity. Accordingly, in this paper, we propose an algorithm, namely Fast DisKmeans (FDKM), to obtain the cluster indicator matrix in a direct way. Moreover, our proposed method has a linear time complexity, which is a significant reduction compared with the squared time complexity of DisKmeans. We also demonstrate that solving the object function of DisKmeans is equivalent to representing the cluster assignment matrix by a low-dimensional linear mapping of the data. Based on this observation, we propose the second algorithm, namely Iterative Fast DisKmeans (IFDKM), which also has a linear time complexity. A series of experiments were conducted on several datasets, and the experimental results showed the superior performance of FDKM and IFDKM.

Clustering Algorithm on Block Division of Documents

Document Clustering Using Locality Preserving Indexing

Design and simulation of a document clustering algorithm based on genetic algorithm

An Efficient Clustering Algorithm for Small Text Documents

Hierarchical Clustering Algorithms for Document Datasets

A clustering algorithm for distributed time-series data

An Improved K-Means Algorithm for Documents Clustering

Algorithm and Experiment Research of Textual Document Clustering Based on Improved K-means

K-means Document Clustering Based on Latent Dirichlet Allocation

A Clustering Algorithm for Short Documents Based On Concept Similarity

Experimental Estimation of Number of Clusters Based on Cluster Quality

A Parallel Varied Density-Based Clustering Algorithm with Optimized Data Partition

Document Clustering Using Sample Weighting

Application of Genetic Algorithm in Document Clustering

A Text Clustering Algorithm to Detect Basic Level Categories in Texts

Subspace Clustering by Directly Solving Discriminative K-means

Research on Location of Logistics Distribution Center Based on K-Means Clustering Algorithm

A Novel K '-Means Algorithm For Clustering Analysis

Block-Based K-Medoids Partitioning Method with Standardized Data to Improve Clustering Accuracy

Clique percolation method for finding naturally cohesive and overlapping document clusters

An effective web document clustering algorithm based on bisection and merge