Abstract:The multi-document summarization task requires the designed summarizer to generate a short text that covers the important information of original multiple documents and satisfies content diversity. To fulfill the dual requirements of coverage and diversity in multi-document summarization, this study introduces a novel method. Initially, a class tree is constructed through hierarchical clustering of documents. Subsequently, a sentence selection method based on class tree is proposed for generating a summary. Specifically, a top-down traversal is performed on the class tree, during which sentences are selected from each node based on their similarity to the centroid of the documents within the node and their dissimilarity to the centroid of documents not belonging to the node. Sentences selected from the root node reflect the commonality of all document, and sentences selected from the sub nodes reflect the distinct specificity of the respective subclasses. Experimental results on standard text summarization datasets DUC'2002, DUC'2003, and DUC'2004 demonstrate that the proposed method significantly outperforms the variant method that considers only commonality of all documents, achieving average improvements of up to 1.54 and 1.42 in ROUGE-1 and ROUGE-L scores, respectively. Additionally, the method demonstrates significant superiority over another variant method that considers only the specificity of subclasses, achieving average improvements of up to 2.16 and 2.01 in ROUGE-1 and ROUGE-L scores, respectively. Furthermore, extensive experiments on DUC'2004 and Multi-News datasets show that the proposed method outperforms lots of competitive supervised and unsupervised multi-document summarization methods and yields considerable results.

Combining N-Gram and Dependency Word Pair for Multi-document Summarization

Deep Dependency Substructure-Based Learning for Multidocument Summarization.

Automatic Document Summarization Via Deep Neural Networks

Query-focused Multi-document Summarization: Combining a Novel Topic Model with Graph-based Semi-supervised Learning

Towards A Unified Approach Based On Affinity Graph To Various Multi-Document Summarizations

Automatic multi-document summarization based on new sentence similarity measures

Co-clustering Sentences and Terms for Multi-document Summarization

SemSUM: Semantic Dependency Guided Neural Abstractive Summarization

Mining Both Commonality and Specificity From Multiple Documents for Multi-Document Summarization

SgSum: Transforming Multi-document Summarization into Sub-graph Selection

Towards a Unified Approach to Simultaneous Single-Document and Multi-Document Summarizations

Improved affinity graph based multi-document summarization

Multi-Granularity Interaction Network for Extractive and Abstractive Multi-Document Summarization.

Manifold-Ranking Based Topic-Focused Multi-Document Summarization

Abstractive Multi-Document Summarization Via Joint Learning with Single-Document Summarization.

Single Document Summarization with Document Expansion

Document-aware Positional Encoding and Linguistic-guided Encoding for Abstractive Multi-document Summarization

Joint Parsing and Generation for Abstractive Summarization

An Integrated Graph Model For Document Summarization

Single document summarization using the information from documents with the same topic

Multi-document Summarization Using Support Vector Regression