Abstract:It is widely accepted that traditional word embedding models, which rely on distributional semantics hypothesis, are relatively limited for contrast meaning problem. Distributional semantics hypothesis indicates that words lying in similar contexts have similar representations in vector space. Nevertheless, synonyms and antonyms often locate in similar contexts, which means they appear close to each other in vector space. Hence, it is of great difficulty to distinguish antonyms from synonyms. To address this challenge, we propose an optimization model, named Lexicon-based Word Embedding Tuning (LWET) model. The goal of LWET is to incorporate reliable semantic lexicons to tune the distributions of pre-trained word embeddings in the vector space so as to improve their ability of distinguishing antonyms from synonyms. To speed up the training process of LWET, we propose two approximation algorithms, including positive sampling and quasi-hierarchical softmax. Compared with quasi-hierarchical softmax, positive sampling is faster, however, at the cost of worse performance. In experiments, LWET and other state-of-the-art models are tested on antonyms recognition, distinguishing antonyms from synonyms and word similarity. The results of the first two experiments show that LWET significantly improves the ability of word embeddings to detect antonyms, thus achieving the state-of-the-art performance. On word similarity, LWET gets slightly better performance than the state-of-the-art models. It means that LWET can remain and strengthen the semantic structure rather than destroy it when tuning word distributions in vector space. In general, compared with related work, LWET can not only achieve similar or even better performance, but also speed up the training process.

Revisiting Embedding Features for Simple Semi-supervised Learning.

Bilinear Joint Learning of Word and Entity Embeddings for Entity Linking.

Do Multi-Sense Embeddings Improve Natural Language Understanding?

Revisiting Semi-Supervised Learning with Graph Embeddings

Field Embedding: A Unified Grain-Based Framework for Word Representation

Learning word embeddings from dependency relations

Continuous Word Embeddings For Detecting Local Text Reuses At The Semantic Level

Semi-supervised learning of word embeddings

Improving Cross-Domain Chinese Word Segmentation with Word Embeddings

Learning Sense-specific Word Embeddings By Exploiting Bilingual Resources.

From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models

Unsupervised POS Induction with Word Embeddings

Improving Automatic Speech Recognition and Speech Translation Via Word Embedding Prediction

Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification

Word embeddings: A semi-supervised learning method for slot-filling in spoken dialog systems

Learning Word Embedding with Better Distance Weighting and Window Size Scheduling

Learning Word Sense Embeddings from Word Sense Definitions

Revisit Word Embeddings with Semantic Lexicons for Modeling Lexical Contrast

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

A Simple Regularization-based Algorithm for Learning Cross-Domain Word Embeddings

Learning Chinese Word Embeddings from Stroke, Structure and Pinyin of Characters