A Fusion Method for Word Vector Based on Fasttext-KdTree.

Yu Dai,Hongcui Hua,Chenyan Ma,Huixue Zhang,Lei Yang
DOI: https://doi.org/10.1109/cbd.2019.00049
2019-01-01
Abstract:Text categorization is an important part of the field of natural language processing, and it is also one of the current research hot issues. However, at present, text categorization technology still faces the problem of losing some semantic information caused by new words. For this reason, this paper proposes a fusion method for word vector based on fastText-kdTree. Firstly, the method trains word vectors by using fastText model and fills in unknown word vectors by combining n-gram model. Secondly, it uses the idea of kdTree nearest neighbor to find multiple word vectors similar to unknown words. Finally, it fuses the multiple word vectors to form a new representation of word vectors by a gate mechanism. The experimental results show that the proposed method can achieve 91.08% classification accuracy.
What problem does this paper attempt to address?