Improving short text classification using public search engines

Meng Wang,Lanfen Lin,Wang Jing,Penghua Yu,Jiaolong Liu,Fei Xie
DOI: https://doi.org/10.1007/978-3-642-39515-4_14
2013-01-01
Abstract:In Web2.0 applications, lots of the texts provided by users are as short as 3 to 10 words. A good classification against the short texts can help the readers find needed messages more quickly. In this paper, we proposed a method to expand the short texts with the help of public search engines through two steps. First we searched the short text in a public search engine and crawled the result pages. Secondly we regarded the texts in result pages as some background knowledge of the original short text, and extracted the feature vector from them. Therefore we can choose a proper number of the result pages to obtain enough corpuses for feature vector extraction to solve the data sparseness problem. We conducted some experiments under different situations and the empirical results indicated that this enriched representation of short texts can substantially improve the classification effects. © 2013 Springer-Verlag Berlin Heidelberg.
What problem does this paper attempt to address?