On the Information Content of Predictions in Word Analogy Tests

Jugurta Montalvão
DOI: https://doi.org/10.48550/arXiv.2210.09972
2022-10-18
Computation and Language
Abstract:An approach is proposed to quantify, in bits of information, the actual relevance of analogies in analogy tests. The main component of this approach is a softaccuracy estimator that also yields entropy estimates with compensated biases. Experimental results obtained with pre-trained GloVe 300-D vectors and two public analogy test sets show that proximity hints are much more relevant than analogies in analogy tests, from an information content perspective. Accordingly, a simple word embedding model is used to predict that analogies carry about one bit of information, which is experimentally corroborated.
What problem does this paper attempt to address?