Abstract:Traditional clustering algorithms do not consider the semantic relationships among words so that cannot accurately represent the meaning of documents. To overcome this problem, introducing semantic information from ontology such as WordNet has been widely used to improve the quality of text clustering. However, there still exist several challenges, such as synonym and polysemy, high dimensionality, extracting core semantics from texts, and assigning appropriate description for the generated clusters. In this paper, we report our attempt towards integrating WordNet with lexical chains to alleviate these problems. The proposed approach exploits ontology hierarchical structure and relations to provide a more accurate assessment of the similarity between terms for word sense disambiguation. Furthermore, we introduce lexical chains to extract a set of semantically related words from texts, which can represent the semantic content of the texts. Although lexical chains have been extensively used in text summarization, their potential impact on text clustering problem has not been fully investigated. Our integrated way can identify the theme of documents based on the disambiguated core features extracted, and in parallel downsize the dimensions of feature space. The experimental results using the proposed framework on reuters-21578 show that clustering performance improves significantly compared to several classical methods. (C) 2014 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

High-Efficiency Text Clustering Algorithm Based on Semantic Distance

New method of hybrid intelligent text clustering based on semantic similarity

A Method to Improve Text Clustering Algorithm Quality

Text Clustering Based on Feature Space

Research on K-means Text Clustering Algorithm Based on Semantic

A Text Hybrid Clustering Algorithm Based on Hownet Semantics

A Novel Text Clustering Algorithm Based on Inner Product Space Model of Semantic

Online Comment Clustering Based on an Improved Semantic Distance

Text Clustering Based on Improved Latent Semantic Analysis

An Incremental Algorithm of Text Clustering Based on Semantic Sequences

Semantic Correlation Network Based Text Clustering

Short text clustering based on word embeddings and EMD

Semantic document clustering based on ontology

A Semantic Approach for Text Clustering Using WordNet and Lexical Chains

A New Efficient Text Clustering Ensemble Algorithm Based on Semantic Sequences.

Uyghur text clustering based on semantic word set

Incremental Algorithm of Text Soft Clustering

Research on Neural Network Clustering Algorithm for Short Text

Semi-Supervised Semantic Dynamic Text Clustering Algorithm

Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization with Clustering Analysis

Clustering Technology for High Dimensional Data Based on Semantics