Learning Numeral Embedding.

Chengyue Jiang,Zhonglin Nian,Kaihao Guo,Yinggong Zhao,Shanbo Chu,Libin Shen,Kewei Tu
DOI: https://doi.org/10.18653/v1/2020.findings-emnlp.235
2019-01-01
Findings
Abstract:Word embedding is an essential building block for deep learning methods fornatural language processing. Although word embedding has been extensivelystudied over the years, the problem of how to effectively embed numerals, aspecial subset of words, is still underexplored. Existing word embeddingmethods do not learn numeral embeddings well because there are an infinitenumber of numerals and their individual appearances in training corpora arehighly scarce. In this paper, we propose two novel numeral embedding methodsthat can handle the out-of-vocabulary (OOV) problem for numerals. We firstinduce a finite set of prototype numerals using either a self-organizing map ora Gaussian mixture model. We then represent the embedding of a numeral as aweighted average of the prototype number embeddings. Numeral embeddingsrepresented in this manner can be plugged into existing word embedding learningapproaches such as skip-gram for training. We evaluated our methods and showedits effectiveness on four intrinsic and extrinsic tasks: word similarity,embedding numeracy, numeral prediction, and sequence labeling.
What problem does this paper attempt to address?