A Fast Clustering Algorithm for Abnormal and Short Texts

HUANG Yong-guang,LIU Ting,CHE Wan-xiang,HU Xiao-guang
DOI: https://doi.org/10.3969/j.issn.1003-0077.2007.02.010
2007-01-01
Abstract:This paper discusses mainly about the short texts,which occurs on mobile short messages and chat rooms.Because of their irregular style and similarity,we call them abnormal texts.We propose an efficient clustering algorithm based on the duplication information deletion algorithm.It concerns about the features of the abnormal short texts and takes some special methods such as extracting feature code and compressing code to solve this problem.Experiments show that the clustering system based on this algorithm can depose millions of abnormal short texts per hour with high accuracy.
What problem does this paper attempt to address?