Text Clustering using Frequent Contextual Termset

T. M. Akhriza,Yinghua Ma,Jianhua Li
DOI: https://doi.org/10.1109/ICIII.2011.86
2011-11-26
Abstract:We introduce frequent contextual term set (FCT) as an alternative concept of term set construction for text clustering which is produced from the interestingness of documents. Comparing to state-of-art term set, the proposed approach has some advantages: (1) more efficient in term set production (2) more effective in storing the vocabulary amongst documents which express the context amongst documents and (3) more suitable to discover specificity of dataset. To utilize FCT we also introduce frequent contextual term set based hierarchical clustering (FCTHC) which adopts the concept of cancroids in K-means with some main differences. The experiment shows that FCT is the correct pattern to perform text clustering and FCTHC provides flexible approach in clusters construction.
What problem does this paper attempt to address?