The Short Text Classification Method Based on CHI Feature Selection and LDA Topic Model

Cheng ZHENG,Da-kang XIONG,Qian-qian LIU
2014-01-01
Abstract:Chinese short texts contain few words and describe weak signals. the common text classification methods don ’t per-forms well for the short text. In Vector Model, the dimension of the document vector is huge. The huge vector leads to ineffi-cient algorithms. The traditional feature selection methods are based on the mathematical statistics, ignoring the semantic relation-ship between terms from text. Then a method based on CHI feature selection and LDA topic model is introduced to classify Chi-nese short texts. In this method, the result of the LDA topic model is applied to extend the features of data set, which can make classification algorithm contains mathematical statistics and semantic information. The experiment result shows that the method in this paper improves the effect of text classification.
What problem does this paper attempt to address?