Incremental Algorithm for Clustering Texts in Internet-Oriented Topic Detection

YIN Feng-jing,XIAO Wei-dong,GE Bin,LI Fang-fang
DOI: https://doi.org/10.3969/j.issn.1001-3695.2011.01.013
2011-01-01
Abstract:To meet the needs of topic detection for monitoring the public opinion on internet,this paper proposed an incremental clustering algorithm called ICIT to improve the two main disadvantages of single-pass algorithm, that was, being easily effected by the order of inputs and low precision.ICIT inherited the simple principle from single-pass to ensure clustering internet texts in real time and overcame its shortage by selecting only nouns and verbs from content as the content’s vector expression, using vector expression of title with content’s vector expression to express the text better, adopting average-link comparison strategy, introducing generation to accomplish batch process and add a stage for texts to reconsideration and adjust their ascription after first clustering. Experiments approve ICIT’s validity and practicability in heightening the precise of topic detection.
What problem does this paper attempt to address?