Estimating Text Similarity based on Semantic Concept Embeddings

Tim vor der Brück,Marc Pouly
2024-01-09
Abstract:Due to their ease of use and high accuracy, Word2Vec (W2V) word embeddings enjoy great success in the semantic representation of words, sentences, and whole documents as well as for semantic similarity estimation. However, they have the shortcoming that they are directly extracted from a surface representation, which does not adequately represent human thought processes and also performs poorly for highly ambiguous words. Therefore, we propose Semantic Concept Embeddings (CE) based on the MultiNet Semantic Network (SN) formalism, which addresses both shortcomings. The evaluation on a marketing target group distribution task showed that the accuracy of predicted target groups can be increased by combining traditional word embeddings with semantic CEs.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the limitations of using traditional surface structure representation methods (such as Word2Vec) in text similarity estimation, particularly the inaccurate representation of highly ambiguous vocabulary and the failure to adequately reflect human thought processes. To solve these issues, the authors propose a formal representation method based on a MultiNet Semantic Network to generate Semantic Concept Embeddings (CEs). Through this method, the authors hope to achieve better performance in semantic similarity estimation tasks. Specifically, the goals of the paper are: 1. To propose a new semantic representation method—Semantic Concept Embeddings (CEs)—to overcome the shortcomings of traditional word embedding methods (such as Word2Vec) in handling highly ambiguous vocabulary. 2. To use CEs based on the formalization of the MultiNet Semantic Network to improve the accuracy of target group prediction in market segmentation tasks for short text fragments. 3. To experimentally validate the effectiveness of the proposed CEs method in marketing target group allocation tasks and compare it with other baseline methods. The paper demonstrates that combining traditional word embeddings with CEs can improve prediction accuracy in marketing target group allocation tasks.