Adaptive Concept Resolution for document representation and its applications in text mining.

Lidong Bing,Shan Jiang,Wai Lam,Yan Zhang,Shoaib Jameel
DOI: https://doi.org/10.1016/j.knosys.2014.10.003
IF: 8.139
2015-01-01
Knowledge-Based Systems
Abstract:It is well-known that synonymous and polysemous terms often bring in some noise when we calculate the similarity between documents. Existing ontology-based document representation methods are static so that the selected semantic concepts for representing a document have a fixed resolution. Therefore, they are not adaptable to the characteristics of document collection and the text mining problem in hand. We propose an Adaptive Concept Resolution (ACR) model to overcome this problem. ACR can learn a concept border from an ontology taking into the consideration of the characteristics of the particular document collection. Then, this border provides a tailor-made semantic concept representation for a document coming from the same domain. Another advantage of ACR is that it is applicable in both classification task where the groups are given in the training document set and clustering task where no group information is available. The experimental results show that ACR outperforms an existing static method in almost all cases. We also present a method to integrate Wikipedia entities into an expert-edited ontology, namely WordNet, to generate an enhanced ontology named WordNet-Plus, and its performance is also examined under the ACR model. Due to the high coverage, WordNet-Plus can outperform WordNet on data sets having more fresh documents in classification.
What problem does this paper attempt to address?