PMIVec: a Word Embedding Model Guided by Point-Wise Mutual Information Criterion.

Minghong Yao,Liansheng Zhuang,Shafei Wang,Houqiang Li
DOI: https://doi.org/10.1007/s00530-022-00928-4
IF: 3.9
2022-01-01
Multimedia Systems
Abstract:Word embedding aims to represent each word with a dense vector which reveals the semantic similarity between words. Existing methods such as word2vec derive such representations by factorizing the word–context matrix into two parts, i.e., word vectors and context vectors. However, only one part is used to represent the word, which may damage the semantic similarity between words. To address this problem, this paper proposes a novel word embedding method based on point-wise mutual information criterion (PMIVec). Our method explicitly learns the context vector as the final word representation for each word, while discarding the word vector. To avoid the damage of semantic similarity between words, we normalize the word vector during the training process. Moreover, this paper uses point-wise mutual information to measure the semantic similarity between words, which is more consistent with human intuition on semantic similarity. Experiments on public data sets show that our PMIVec model can consistently outperform state-of-the-art models.
What problem does this paper attempt to address?