Quantify the Short-text for a Specific Topic

TIAN Rui,YAN Dan-feng
DOI: https://doi.org/10.3969/j.issn.1003-6970.2012.11.053
2012-01-01
Abstract:The Chinese text vectorization technology is relatively mature, the characteristics of short text makes traditional text vectorization method will bring a lot of problems to deal with short text. In this paper, in-depth study to quantify two important aspects, namely feature selection and weight calculation, comparative analysis of their pros and cons. According to the characteristics of the short text, put forward the improved feature extraction and weight calculation method. when calculate the weight, considering the impact of word length. And the introduction of the concept of the word-length factor. After by the experimental data verify the feasibility and advantages of the method.
What problem does this paper attempt to address?