Abstract:The evolution of data in text streams may cause feature and concept drifts. The former, while being less discussed in the literature, poses challenges for learning algorithms by changing the feature space of text representation. A common approach for handling concept drift is to maintain summarized groups of documents, known as micro-clusters. Despite the benefits, this scheme restricts document representation and poses challenges in the face of feature drift. In this paper, we propose an incremental text clustering algorithm that deals with both kinds of drifts. The algorithm uses incremental word embedding, which is rarely studied in the context of evolving data streams. We also propose a novel approach to leverage hierarchical summarized concepts instead of micro-clusters. The concepts reflect the semantic structure of the text stream and are continuously updated in the face of concept drift and evolution. The proposed method enables a customized low-dimensional and interpretable document representation, which improves the clustering quality. By employing concept modeling, in contrast with many available approaches, the proposed algorithm detaches the process of handling data evolution from document clustering. This modularization enables arbitrary variation in the granularity of document representation and allows for customized clustering when accessing the historical documents is impractical. The experimental results on several real datasets, and comparison with other incremental and non-incremental methods, show that the proposed algorithm can deal with dynamics in the feature space, and concept drift and evolution, while preserving its accuracy.

Survey of Incremental Clustering Algorithms

An Incremental Clustering Algorithm Based On Swarm Intelligence Theory

A Cluster-Based Incremental Recommendation Algorithm on Stream Processing Architecture

Incremental Clustering: The Case for Extra Clusters

An Optimized Ant System For Clustering With Elitist Ant And Local Search

How to Perform Incremental Clustering - A SOM Based View

An Incremental Algorithm for Clustering Search Results

An Incremental Clustering Algorithm Based on CFS

An Incremental Clustering Algorithm Based On Sample Selection

Incremental Subspace Clustering over Multiple Data Streams

Incremental clustering based on affinity propagation

Incremental Clustering Using Information Bottleneck Theory

An Efficient Incremental Algorithm for Clustering Based on Density

A Survey on Incomplete Multi-view Clustering

Incremental Spectral Clustering by Efficiently Updating the Eigen-System.

An incremental clustering algorithm based on semantic concepts

Incremental Clustering Algorithm of Neural Network

Key Grids based Batch-Incremental CLIQUE Clustering Algorithm Considering Cluster Structure Changes

Incremental Spectral Clustering with Application to Monitoring of Evolving Blog Communities

Incremental CFS Clustering on Large Data.

A Novel Clustering Algorithm for Large-Scale Text Collection and Its Incremental Version.