GLIMMER: Incorporating Graph and Lexical Features in Unsupervised Multi-Document Summarization

Ran Liu,Ming Liu,Min Yu,Jianguo Jiang,Gang Li,Dan Zhang,Jingyuan Li,Xiang Meng,Weiqing Huang
2024-08-20
Abstract:Pre-trained language models are increasingly being used in multi-document summarization tasks. However, these models need large-scale corpora for pre-training and are domain-dependent. Other non-neural unsupervised summarization approaches mostly rely on key sentence extraction, which can lead to information loss. To address these challenges, we propose a lightweight yet effective unsupervised approach called GLIMMER: a Graph and LexIcal features based unsupervised Multi-docuMEnt summaRization approach. It first constructs a sentence graph from the source documents, then automatically identifies semantic clusters by mining low-level features from raw texts, thereby improving intra-cluster correlation and the fluency of generated sentences. Finally, it summarizes clusters into natural sentences. Experiments conducted on Multi-News, Multi-XScience and DUC-2004 demonstrate that our approach outperforms existing unsupervised approaches. Furthermore, it surpasses state-of-the-art pre-trained multi-document summarization models (e.g. PEGASUS and PRIMERA) under zero-shot settings in terms of ROUGE scores. Additionally, human evaluations indicate that summaries generated by GLIMMER achieve high readability and informativeness scores. Our code is available at <a class="link-external link-https" href="https://github.com/Oswald1997/GLIMMER" rel="external noopener nofollow">this https URL</a>.
Computation and Language
What problem does this paper attempt to address?