Vec2GC -- A Graph Based Clustering Method for Text Representations

Rajesh N Rao,Manojit Chakraborty
DOI: https://doi.org/10.48550/arXiv.2104.09439
2021-04-15
Information Retrieval
Abstract:NLP pipelines with limited or no labeled data, rely on unsupervised methods for document processing. Unsupervised approaches typically depend on clustering of terms or documents. In this paper, we introduce a novel clustering algorithm, Vec2GC (Vector to Graph Communities), an end-to-end pipeline to cluster terms or documents for any given text corpus. Our method uses community detection on a weighted graph of the terms or documents, created using text representation learning. Vec2GC clustering algorithm is a density based approach, that supports hierarchical clustering as well.
What problem does this paper attempt to address?