Abstract:It is widely accepted that traditional word embedding models, which rely on distributional semantics hypothesis, are relatively limited for contrast meaning problem. Distributional semantics hypothesis indicates that words lying in similar contexts have similar representations in vector space. Nevertheless, synonyms and antonyms often locate in similar contexts, which means they appear close to each other in vector space. Hence, it is of great difficulty to distinguish antonyms from synonyms. To address this challenge, we propose an optimization model, named Lexicon-based Word Embedding Tuning (LWET) model. The goal of LWET is to incorporate reliable semantic lexicons to tune the distributions of pre-trained word embeddings in the vector space so as to improve their ability of distinguishing antonyms from synonyms. To speed up the training process of LWET, we propose two approximation algorithms, including positive sampling and quasi-hierarchical softmax. Compared with quasi-hierarchical softmax, positive sampling is faster, however, at the cost of worse performance. In experiments, LWET and other state-of-the-art models are tested on antonyms recognition, distinguishing antonyms from synonyms and word similarity. The results of the first two experiments show that LWET significantly improves the ability of word embeddings to detect antonyms, thus achieving the state-of-the-art performance. On word similarity, LWET gets slightly better performance than the state-of-the-art models. It means that LWET can remain and strengthen the semantic structure rather than destroy it when tuning word distributions in vector space. In general, compared with related work, LWET can not only achieve similar or even better performance, but also speed up the training process.

Improving Word Embeddings for Antonym Detection Using Thesauri and SentiWordNet.

Using Context-to-Vector with Graph Retrofitting to Improve Word Embeddings

Improve Word Embedding Using Both Writing and Pronunciation.

Revisit Word Embeddings with Semantic Lexicons for Modeling Lexical Contrast

Not All Synonyms Are Created Equal: Incorporating Similarity of Synonyms to Enhance Word Embeddings

Chinese Word Sense Embedding with SememeWSD and Synonym Set

Visual Exploration and Comparison of Word Embeddings.

Learning Effective Word Embedding Using Morphological Word Similarity

An Exploration Of Semantic Relations In Neural Word Embeddings Using Extrinsic Knowledge

Improving Word Vector with Prior Knowledge in Semantic Dictionary.

BioWordVec, improving biomedical word embeddings with subword information and MeSH

Diachronic Synonymy and Polysemy: Exploring Dynamic Relation Between Forms and Meanings of Words Based on Word Embeddings

Learning Word Sense Embeddings from Word Sense Definitions

Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks

WORD EMBEDDING ATTENTION NETWORK: GENERATING WORDS BY QUERYING DISTRIBUTED WORD REPRESENTATIONS FOR PARAPHRASE GENERATION

Text Semantic Steganalysis Based On Word Embedding

Improved Learning of Word Embeddings with Word Definitions and Semantic Injection.

An Improved Historical Embedding without Alignment

Enhancing Semantic Word Representations by Embedding Deeper Word Relationships

Exploiting WordNet Synset and Hypernym Representations for Answer Selection.

Improving the Accuracy of Pre-trained Word Embeddings for Sentiment Analysis