Massive Short Documents Classification Method Based on Frequent Term Set Clustering

WANG Yong-heng,JIA Yan,YANG Shu-qiang
DOI: https://doi.org/10.3969/j.issn.1000-7024.2007.08.003
2007-01-01
Abstract:With the rapid development of information technology,huge data is accumulated.A vast amount of such data appears as short documents.It is very useful to classify such short documents to get knowledge automatically form the data.But most of the current classification algorithms can not get acceptable accuracy since key words appear few times in short documents.Some classification al-gorithms based on semantic information are more accurate but they are inefficient to be used to process very large document sets.A novel classification method based on frequent term set clustering is proposed.This method uses frequent term set clustering to compress massive data and uses semantic information to improve accuracy.Experimental study shows that this method is more accurate and efficient than other classification algorithms when classifying massive short documents.
What problem does this paper attempt to address?