A Method of Feature Selection Based on Word2Vec in Text Categorization

Wenfeng Tian,Jun Li,Hongguang Li
DOI: https://doi.org/10.23919/chicc.2018.8483345
2018-01-01
Abstract:In text categorization, the performance of classifier decreases with the increase of feature dimension. The main purpose of feature selection is to remove irrelevant features and redundant features in features and reduce feature dimension. Traditional methods of feature selection, such as CHI, IG, DF and so on, take into account only the number of appearances of features and ignore the feature semantics and part-of-speech features. The vector representations of words learned by word2vec models have been shown to carry semantic meanings and are useful in various NLP tasks. Based on the word vectors generated by Word2Vec, the paper proposes the algorithm Word2Vec-SM to reduce the dimensionality of the features. Experimental proof word2vec-SM algorithm.
What problem does this paper attempt to address?