A Text Clustering Algorithm to Detect Basic Level Categories in Texts
Jingyun Xu,Yi Cai,Shuai Wang,Kai Yang,Qing Du,Jun Zhang,Li Yao,Jingjing Li
DOI: https://doi.org/10.1007/978-3-319-66733-1_8
2017-01-01
Abstract:With the rapid development of Internet and explosion of texts, an appropriate way to organize the amount of texts is necessary. Text clustering is of great practical importance for web-learning, which can group similar texts (e.g. documents, textbooks and online notes) to provide users with more valuable information. However, most of existing text clustering algorithms are very sensitive to the parameters needed to be input by users and it is hard to set an appropriate parameter as computers do not know what an appropriate parameter is. Therefore, aiming at this problem, according to the studies of cognitive psychology and our observation, this paper firstly introduces basic level categories and category utility, and then propose a text clustering algorithm to detect basic level categories in texts automatically, which is an non-parametric algorithm. The experimental results show that our algorithm significantly outperforms one basic level concept detection method, k-means and single linkage clustering on different datasets.