Abstract:Multi-sense word embedding is an important extension of neural word embeddings. By leveraging context of each word instance, multi-prototype version of word embeddings were accomplished to represent the multi-senses. Unfortunately, this kind of context based approach inevitably produces multiple senses which should actually be a single one, suffering from the various context of a word. (Shi et al., 2016) used WordNet to evaluate the neighborhood similarity of each sense pair to detect such pseudo multi-senses. In this paper, a novel framework for unsupervised corpus sense tagging is presented, which mainly contains four steps: (a) train multi-sense word embeddings on the given corpus, using existing multi-sense word embedding frameworks; (b) detect pseudo multi-senses in the obtained embeddings, without requirement to any extra language resources; (c) label each word in the corpus with a specific sense tag, with respect to the result of pseudo multi-sense detection; (d) re-train multi-sense word embeddings with the pre-selected sense tags. We evaluate our framework by training word embeddings with the obtained sense specific corpus. On the tasks of word similarity, word analogy as well as sentence understanding, the embeddings trained on sense-specific corpus obtain better results than the basic strategy which is applied in step (a).

On Modeling Sense Relatedness in Multi-prototype Word Embedding.

Do Multi-Sense Embeddings Improve Natural Language Understanding?

A Probabilistic Model for Learning Multi-Prototype Word Embeddings.

Multi-sense Definition Modeling using Word Sense Decompositions

Real Multi-Sense or Pseudo Multi-Sense: an Approach to Improve Word Representation

Leveraging Human Prior Knowledge to Learn Sense Representations

Constructing High Quality Sense-specific Corpus and Word Embedding Via Unsupervised Elimination of Pseudo Multi-sense.

Gaussian Mixture Embeddings for Multiple Word Prototypes.

Modeling multi-prototype Chinese word representation learning for word similarity

Understanding and Improving Multi-Sense Word Embeddings via Extended Robust Principal Component Analysis

Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations.

Beyond Bilingual: Multi-sense Word Embeddings using Multilingual Context

Bridging Text and Knowledge with Multi-Prototype Embedding for Few-Shot Relational Triple Extraction.

Learning Sense-specific Word Embeddings By Exploiting Bilingual Resources.

Multi-phase Word Sense Embedding Retrofitting with Lexical Ontology

Multi-phase Word Sense Embedding Learning Using a Corpus and a Lexical Ontology.

Learning Context-Sensitive Word Embeddings with Neural Tensor Skip-Gram Model

Addressing the Polysemy Problem in Language Modeling with Attentional Multi-Sense Embeddings

Learning Context-Specific Word/Character Embeddings.

Context-Specific and Multi-Prototype Character Representations.

Bridge Text and Knowledge by Learning Multi-Prototype Entity Mention Embedding