A Statistics-Based Semantic Relation Analysis Approach For Document Clustering

Xin Cheng,Duoqian Miao,Lei Wang
DOI: https://doi.org/10.1007/978-3-319-11740-9_31
2014-01-01
Abstract:Document clustering is a widely research topic in the area of machine learning. A number of approaches have been proposed to represent and cluster documents. One of the recent trends in document clustering research is to incorporate the semantic information into document representation. In this paper, we introduce a novel technique for capturing the robust and reliable semantic information from term-term co-occurrence statistics. Firstly, we propose a novel method to evaluate the explicit semantic relation between terms from their co-occurrence information. Then the underlying semantic relation between terms is also captured by their interaction with other terms. Lastly, these two complementary semantic relations are integrated together to capture the complete semantic information from the original documents. Experimental results show that clustering performance improves significantly by enriching document representation with the semantic information.
What problem does this paper attempt to address?