Sentences clustering based automatic summarization
Jianhui Wang,Shuigeng Zhou,Yunfa Hu
DOI: https://doi.org/10.1109/ICMLC.2003.1264442
2003-01-01
Abstract:There are two ways by which the research on automatic summarization is carried out. One is based on statistics, and the other is based on message understanding. The former has nothing to do with domain, but its accuracy is lower. On the contrary, the latter depends on domain, but its accuracy is higher. In this paper, an algorithm, which summarizes a document by extracting subtopics from the sentences, is based on statistics and partially understanding message, in order to get better summarization and get rid of the dependence on domain. Besides, since it is difficult to determine the length of a summary manually, the algorithm also strives to obtain a better summary with proper length. To this end, a new module of mutual dependence is put forward too and applied to segmentation, which can select accuracy features for the summarizing algorithm. And then new rules are brought forward to evaluate sentences for the summarizing algorithm. Furthermore, a new task based algorithm to evaluating summarization is impersonally offered.