Abstract:In natural language processing (NLP), text similarity calculation is widely used in information retrieval, machine translation, text mining etc. The definition of the similarity between texts may not just refer to words with similar meanings. Domain similarity, which evaluates the similarity on basis of domain reference, is becoming a promising approach in dealing with large documents. By adopting domain similarity calculation, the degree of similarity could be controlled at different semantic levels, and extract texts in different domain granularity. For example, web pages of Lakers, NBA, basketball and sports could be retrieved respectively with different settings of domain similarity. LSI (Latent Semantic Indexing) is a feasible approach that can be applied to calculate text domain similarity. By controlling the number of topics, the domain similarity can be determined in different granularity. However, the performance is greatly affected by the number of specified topics, which is required for LSI algorithm. In this paper, an adaptive method was applied to word similarity calculation. TF-IDF was used to get the word frequency in the text, and the number of topics in the mixed text, set by dimensionality reduction and clustering was automatically obtained. According to the number of clusters, the similarity between text domains was calculated as the number of topics mapped to the subspace in the LSI. Accordingly, experimental results have shown that the method proposed in this paper is superior to other algorithms in the accuracy of text similarity calculation.

A Combined Measure for Text Semantic Similarity

An adaptive method for text domain similarity calculation

Short Text Similarity Calculation Using Semantic Information

A New Hypred Improved Method for Measuring Concept Semantic Similarity in WordNet.

A Novel Comprehensive Approach for Estimating Concept Semantic Similarity in WordNet

Measurement of Text Similarity: A Survey

A survey on the techniques, applications, and performance of short text semantic similarity

An improved method for semantic similarity calculation based on stop-words

Measuring Semantic Similarity Between Words Based On Multiple Relational Information

Research on Chinese Semantic Similarity Algorithm

Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy

Semantic Similarity Calculation Based on Sememe Set

A Text Similarity Measurement Based on Semantic Fingerprint of Characteristic Phrases

Semantic Measures for the Comparison of Units of Language, Concepts or Instances from Text and Knowledge Base Analysis

A Hybrid Semantic Similarity Measurement for Geospatial Entities

A Novel Linguistic Phenomenon Description for Text Similarity Computing

Measuring Short Text Semantic Similarity Using Multiple Measurements

Description and Evaluation of Semantic Similarity Measures Approaches

An Approach to Measuring Semantic Similarity and Relatedness Between Concepts in an Ontology.

From Ontology to Semantic Similarity: Calculation of Ontology-Based Semantic Similarity

Statutes Recommendation Based on Text Similarity.