Abstract:Summarization for scientific text has shown significant benefits both for the research community and human society. Given the fact that the nature of scientific text is distinctive and the input of the multi-document summarization task is substantially long, the task requires sufficient embedding generation and text truncation without losing important information. To tackle these issues, in this paper, we propose SKT5SciSumm - a hybrid framework for multi-document scientific summarization (MDSS). We leverage the Sentence-Transformer version of Scientific Paper Embeddings using Citation-Informed Transformers (SPECTER) to encode and represent textual sentences, allowing for efficient extractive summarization using k-means clustering. We employ the T5 family of models to generate abstractive summaries using extracted sentences. SKT5SciSumm achieves state-of-the-art performance on the Multi-XScience dataset. Through extensive experiments and evaluation, we showcase the benefits of our model by using less complicated models to achieve remarkable results, thereby highlighting its potential in advancing the field of multi-document summarization for scientific text.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively generate high - quality scientific text summaries in the multi - document scientific summarization task. Specifically, the paper focuses on the following challenges: 1. **Unique nature of scientific texts**: Scientific texts usually have a specific writing style and contain academic terms, which requires that the model used for summary generation must be able to accurately understand these contents. 2. **Input length of multi - document summaries**: The input texts in multi - document summarization tasks are usually very long, so effective embedding generation and text truncation methods are required to handle long texts without losing important information. 3. **Information redundancy and cross - document relationships**: Multi - document summarization tasks need to handle duplicate information and the relationships between documents, which are not required to be considered in single - document summarization tasks. To solve these problems, the paper proposes a hybrid framework SKT5SciSumm, which combines extractive and generative summarization methods. The specific steps are as follows: - **Extractive summary**: Use SPECTER (Sentence - Transformer version of Scientific Paper Embeddings using Citation - Informed Transformers) to encode and represent text sentences, and then select important sentences through the k - means clustering algorithm to achieve efficient extractive summary. - **Generative summary**: Use the T5 series models to generate the final summary based on the extracted sentences. Through this hybrid method, SKT5SciSumm has achieved significant performance improvements on the Multi - XScience dataset, especially excellent performance on evaluation metrics such as ROUGE - 1, ROUGE - 2, ROUGE - L, ROUGE - LSum and BERTScore.

SKT5SciSumm -- Revisiting Extractive-Generative Approach for Multi-Document Scientific Summarization

SgSum: Transforming Multi-document Summarization into Sub-graph Selection

Disentangling Instructive Information from Ranked Multiple Candidates for Multi-Document Scientific Summarization

Synthesizing Scientific Summaries: An Extractive and Abstractive Approach

Topic-Centric Unsupervised Multi-Document Summarization of Scientific and News Articles

Large-Scale Multi-Document Summarization with Information Extraction and Compression

SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline

CiteSum: Citation Text-guided Scientific Extreme Summarization and Domain Adaptation with Limited Supervision

GoSum: Extractive Summarization of Long Documents by Reinforcement Learning and Graph Organized discourse state

Summaformers @ LaySumm 20, LongSumm 20

Multi-Document Summarization Based On Two-Level Sparse Representation Model

Combination of abstractive and extractive approaches for summarization of long scientific texts

DiffuSum: Generation Enhanced Extractive Summarization with Diffusion

Leveraging Salience Analysis and Sparse Attention for Long Document Summarization

Extractive Summarization As Text Matching

A Supervised Approach to Extractive Summarisation of Scientific Papers

Rethinking Transformer-based Multi-document Summarization: An Empirical Investigation

Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles

SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization

Enhancing Scientific Papers Summarization with Citation Graph

Data-driven Summarization of Scientific Articles