Short Documents Classification Method in Very Large Text Database

Wang Yongheng,Jia Yan,Yang Shuqiang
DOI: https://doi.org/10.3321/j.issn:1002-8331.2006.22.002
2006-01-01
Abstract:With the rapid development of information technology,huge data are accumulated.A vast amount of such data appears as short documents.It is very useful to classify such short documents to get knowledge automatically form the data.But most of the current classification algorithms can't get acceptable accuracy since key words appear less time in short documents and the labeled training examples are usually very few.Some classification algorithms based on semantic information is more accurate but they are inefficient to be used to process very large document sets.In this paper,we propose a novel classification method based on semantic text features graph and kNN like method.Our experimental study shows that our algorithm is more accurate and efficient than other classification algorithms when classifying large scale short documents.
What problem does this paper attempt to address?