Improving Abstractive Multi-document Summarization with Predicate-Argument Structure Extraction.

Huangfei Cheng,Jiawei Wu,Tiantian Li,Bin Cao,Jing Fan
DOI: https://doi.org/10.1007/978-3-031-20865-2_20
2022-01-01
Abstract:Multi-Document Summarization (MDS) aims to generate a concise summary for a collection of documents on the same topic. However, the fixed input length and a large number of redundancies in source documents make the pre-trained models less effective in MDS. In this paper, we propose a two-stage abstractive MDS model based on Predicate-Argument Structure (PAS). In the first stage, we divide the redundancy of documents into intra-sentence redundancy and intersentence redundancy. For intra-sentence redundancy, our model utilizes Semantic Role Labeling (SRL) to covert each sentence to a PAS. Benefiting from PAS, we can filter out redundant contents while preserving the salient information. For inter-sentence redundancy, we introduce a novel similarity calculation method that incorporates semantic and syntactic knowledge to identify and remove duplicate information. The above two steps significantly shorten the input length and eliminate documents redundancies, which is crucial for MDS. In the second stage, we sort the filtered PASs to ensure important contents appear at the beginning and concatenate them into a new document. We employ a pre-trained model ProphetNet to generate an abstractive summary from the new document. Our model combines the advantages of ProphetNet and PAS on global information to generate comprehensive summaries. We conduct extensive experiments on three standard MDS datasets. All experiments demonstrate that our model outperforms the abstractive MDS baselines measured by ROUGE scores. Furthermore, the first stage of our model can improve the performance of other pre-trained models in abstractive MDS.
What problem does this paper attempt to address?