Simultaneous Clustering and Noise Detection for Theme-based Summarization.

Xiaoyan Cai,Renxian Zhang,Dehong Gao,Wenjie Li
2011-01-01
Abstract:Multi-document summarization aims to produce a concise summary that contains salient information from a set of source documents. Since documents often cover a number of topical themes with each theme represented by a cluster of highly related sentences, sentence clustering plays a pivotal role in theme-based summarization. Moreover, noting that realworld datasets always contain noises which inevitably degrade the clustering performance, we incorporate noise detection with spectral clustering to generate ordinary sentence clusters and one noise sentence cluster. We are also interested in making the theme-based summaries biased towards a user’s query. The effectiveness of the proposed approaches is demonstrated by both the cluster quality analysis and the summarization evaluation conducted on the DUC generic and queryoriented summarization datasets.
What problem does this paper attempt to address?