Abstract:Applications in many domains such as text mining and natural language processing need to deal with high-dimensional data. High-dimensional data may present better clustering characteristics on a selected low-dimensional subspace. Subspace clustering is to project the data onto a low-dimensional subspace before clustering. Traditional subspace clustering methods employ eigenvalue decomposition to find the projection of the input data and perform K-means or kernel K-means to obtain the clustering matrix. This kind of methods is not only inefficient, but also adopts a two-step method to generate an approximate solution. Although Discriminative K-means (DisKmeans) integrates dimensionality reduction and clustering into a joint framework and solves the optimization problem by kernel K-means, such method needs to find the centroids in the kernel space and class labels iteratively and has a square time complexity. Accordingly, in this paper, we propose an algorithm, namely Fast DisKmeans (FDKM), to obtain the cluster indicator matrix in a direct way. Moreover, our proposed method has a linear time complexity, which is a significant reduction compared with the squared time complexity of DisKmeans. We also demonstrate that solving the object function of DisKmeans is equivalent to representing the cluster assignment matrix by a low-dimensional linear mapping of the data. Based on this observation, we propose the second algorithm, namely Iterative Fast DisKmeans (IFDKM), which also has a linear time complexity. A series of experiments were conducted on several datasets, and the experimental results showed the superior performance of FDKM and IFDKM.

Text Clustering Based on Feature Space

A Linguistic Feature Based Text Clustering Method.

A Novel Text Clustering Algorithm Based on Inner Product Space Model of Semantic

Improving Short Text Classification Through Better Feature Space Selection

A Text Clustering Algorithm to Detect Basic Level Categories in Texts

An Evaluation on Feature Selection for Text Clustering

Fuzzy C-Means Text Clustering Based on Topic Concept Sub-Space

Algorithm and Experiment Research of Textual Document Clustering Based on Improved K-means

Clustering Text Data Streams

Research on a Text Data Preprocessing Method Suitable for Clustering Algorithm

Subspace Clustering by Directly Solving Discriminative K-means

Text Stream Clustering Algorithm Based on Adaptive Feature Selection.

Concept chain based text clustering

Feature Dimension Reduction Short Text Clustering Combined with Semantic and Statistics

CWC: A Clustering-Based Feature Weighting Approach for Text Classification

Tag Clustering Algorithm Using Object-based Feature Vector

Document Clustering Based on Semantic Smoothing Approach

A Clustering Algorithm for Short Documents Based On Concept Similarity

Clustering Massive Text Data Streams by Semantic Smoothing Model

Improved GA-based Text Clustering Algorithm

A New Text Clustering Method Using Hidden Markov Model