Optimize Hierarchical Softmax with Word Similarity Knowledge.

Zhixuan Yang,Chong Ruan,Caihua Li,Junfeng Hu
DOI: https://doi.org/10.17562/pb-55-2
2017-01-01
Polibits
Abstract:Abstract—Hierarchical softmax is widely used to accelerate the training speed of neural language models and word embedding models. Traditionally, people believed that the hierarchical tree of words should be organized by the semantic meaning of words. However, Mikolov et al. showed that high quality word embeddings can also be trained by simply using the Huffman tree of words. To our knowledge, no work gives a theoretic analysis on how we should organize the hierarchical tree. In this paper, we try to answer this question theoretically by treating the tree structure as a parameter of the training objective function. As a result, we can show that the Huffman tree maximizes the (augmented) training function when word embeddings are random. Following this, we propose SemHuff, a new tree constructing scheme based on adjusting the Huffman tree with word similarity knowledge. Experiment results show that word embeddings trained with optimized hierarchical tree can give better results in various tasks.
What problem does this paper attempt to address?