Abstract:Word embedding algorithms produce very reliable feature representations of words that are used by neural network models across a constantly growing multitude of NLP tasks. As such, it is imperative for NLP practitioners to understand how their word representations are produced, and why they are so impactful. The present work presents the Simple Embedder framework, generalizing the state-of-the-art existing word embedding algorithms (including Word2vec (SGNS) and GloVe) under the umbrella of generalized low rank models. We derive that both of these algorithms attempt to produce embedding inner products that approximate pointwise mutual information (PMI) statistics in the corpus. Once cast as Simple Embedders, comparison of these models reveals that these successful embedders all resemble a straightforward maximum likelihood estimate (MLE) of the PMI parametrized by the inner product (between embeddings). This MLE induces our proposed novel word embedding model, Hilbert-MLE, as the canonical representative of the Simple Embedder framework. We empirically compare these algorithms with evaluations on 17 different datasets. Hilbert-MLE consistently observes second-best performance on every extrinsic evaluation (news classification, sentiment analysis, POS-tagging, and supersense tagging), while the first-best model depends varying on the task. Moreover, Hilbert-MLE consistently observes the least variance in results with respect to the random initialization of the weights in bidirectional LSTMs. Our empirical results demonstrate that Hilbert-MLE is a very consistent word embedding algorithm that can be reliably integrated into existing NLP systems to obtain high-quality results.

PMIVec: a Word Embedding Model Guided by Point-Wise Mutual Information Criterion.

Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation.

Improve Word Embedding Using Both Writing and Pronunciation.

ESPN: Memory-Efficient Multi-Vector Information Retrieval

A comparison of correspondence analysis with PMI-based word embedding methods

Fast Extraction of Word Embedding from Q-contexts

A Probabilistic Model for Learning Multi-Prototype Word Embeddings.

Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective.

A Simple Language Model based on PMI Matrix Approximations

Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings

A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution

Post-Processing of Word Representations via Variance Normalization and Dynamic Embedding

VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks

Word Embedding Algorithms as Generalized Low Rank Models and their Canonical Form

Mixed-Precision Embeddings for Large-Scale Recommendation Models

Model-enhanced Vector Index

Efficient information retrieval based on a combination of vector space and probabilistic models

Multimodal Weibull Variational Autoencoder for Jointly Modeling Image-Text Data

Mmfe: Multitask Multiview Feature Embedding

M-Sne: Multiview Stochastic Neighbor Embedding.

BioWordVec, improving biomedical word embeddings with subword information and MeSH