Multi-document Summarization via LDA and Density Peaks Based Sentence-Level Clustering

Baoyan Wang,Yuexian Zou,Jian Zhang,Jun Jiang,Yi Liu
DOI: https://doi.org/10.1007/978-981-13-1648-7_27
2017-01-01
Abstract:In this paper, we present a novel unsupervised extractive multi-document summarization method by ranking sentences based on the integrated sentence scoring method. The cluster-based methods tend to ignore informativeness of words and Latent Dirichlet Allocation (LDA) based methods are inclined to extract the longish sentences and cannot remove redundancy directly. Those methods select sentences with higher score to generate summaries but not necessarily to the optimal summaries. Our method takes four key issues of sentences into account concurrently by applying LDA to calculate term weighting of words and evaluate the informativeness of sentences and then applying Density Peaks Clustering (DPC) to assess relevance and diversity of sentences simultaneously. Our method achieves the best property on the DUC2004 dataset, which outperforms the state-of-the-art methods, such as DUC2004 Best, R2N2_ILP [3], and WCS [13].
What problem does this paper attempt to address?