WordNet and Semantic similarity based approach for document clustering

J. Laxminarayana,S. Desai
DOI: https://doi.org/10.1109/CSITSS.2016.7779377
2016-10-01
Abstract:With the ceaseless flourishing of the internet, the number of text documents in electronic form is increasing exorbitantly. Thus document clustering which organizes such large collections of documents into meaningful clusters has become an important technique. Traditional clustering methods cluster documents based on statistical features. Thus, documents clustered together using traditional clustering methods are not conceptually similar to one another as semantic relationships between documents are ignored. In this paper, a model for document clustering that groups documents with similar concepts together is introduced. Proposed model initially identifies all the coreferences in each of the documents in the collection. Polysemy and synonymy problems are tackled by capturing an appropriate sense of the word based on its context using the WordNet and the Semantic similarity. The proposed clustering model is implemented for the classic4 dataset and the results show an improvement in the efficiency.
Computer Science
What problem does this paper attempt to address?