Abstract:The measurement of semantic similarity between concepts is an important research topic in natural language processing. In the past, several approaches for measuring the semantic similarity between concepts have been proposed based on WordNet or Wikipedia. However, improvements in the measurement accuracy of most methods have led to a dramatic increase in time complexity, and the existing methods do not effectively integrate WordNet and Wikipedia. In this paper, we focus on designing an efficient semantic similarity method based on WordNet and Wikipedia. To improve the accuracy of WordNet edge-based measures, we propose an edge weight model for combining edge and density information, which assigns a weight to each edge adaptively based on the number of direct hyponyms of the subsumer. Second, to improve the computational efficiencies of the existing Wikipedia link vector-based measures, we propose a new Wikipedia link feature-based semantic similarity method that converts Wikipedia links into semantic knowledge and replaces the TF-IDF statistical weight model in the existing measures. In addition, we propose two new word disambiguation strategies to further improve the accuracy of Wikipedia link-based measures. Finally, to fully exploit the advantages of WordNet and Wikipedia, we propose two new aggregation schemas for combining WordNet “is-a” semantics and Wikipedia link semantics to replace the current aggregation schemas that combine WordNet “is-a” semantics with category semantics in Wikipedia. The experimental results show that our aggregation models are outstanding in terms of accuracy, efficiency and word coverage compared to state-of-the-art similarity measures.

Fusing Syntax and Word Embedding Knowledge for Measuring Semantic Similarity

A Semantic Textual Similarity Measurement Model Based on the Syntactic-Semantic Representation

Information mining and similarity computation for semi-/un-structured sentences from the social data

Sentence Similarity Based on Semantic Vector Model.

BIT at SemEval-2017 Task 1: Using Semantic Information Space to Evaluate Semantic Textual Similarity.

Computing Semantic Text Similarity Using Rich Features.

Semantic Similarity Computation Between Sentences

A Combined Measure for Text Semantic Similarity

BIT at SemEval-2016 Task 1: Sentence Similarity Based on Alignments and Vector with the Weight of Information Content.

Exploration on Efficient Similar Sentences Extraction.

Information Extraction and Similarity Computation for Semi-/Un-Structured Sentences from the Cyberdata

Sentence Similarity Computation by Integrating Shallow and Deep Information

Assessing Text Semantic Similarity Using Ontology

Syntactic Impact On Sentence Similarity Measure In Archive-Based Qa System

Similarity Calculation of Fusion Sentence Surface Information and the Syntax Structure

Improving Semantic Similarity Computation Via Subgraph Feature Fusion Based on Semantic Awareness

A Short-Text Similarity Model Combining Semantic and Syntactic Information

Calculating Statistical Similarity Between Sentences

An Efficient Approach for Measuring Semantic Similarity Combining WordNet and Wikipedia

A Text Similarity Measurement Employs Semantic Dictionary-based Sentiment Analysis

Measuring Short Text Semantic Similarity Using Multiple Measurements