Abstract:Recent advances in low-resource abstractive summarization were largely made through the adoption of specialized pre-training, pseudo-summarization, that integrates the content selection knowledge through various centrality-based sentence recovery tasks. However, despite the substantial results, there are several cases where the predecessor general-purpose pre-trained language model BART outperforms the summarization-specialized counterparts in both few-shot and fine-tuned scenarios. In this work, we investigate these performance irregularities and shed some light on the effect of pseudo-summarization pre-training in low-resource settings. We benchmarked five pre-trained abstractive summarization models on five datasets of diverse domains and analyzed their behavior in terms of extractive intuition and attention patterns. Despite that all models exhibit extractive behavior, some lack the prediction confidence to copy longer text fragments and have a misaligned attention distribution with the structure of the real-world texts. The latter happens to be the major factor of underperformance in fiction, news, and scientific articles domains as the better initial attention alignment of BART leads to the best benchmark results in all few-shot settings. A further examination reveals that BART summarization capabilities are the side-effect of the combination of sentence permutation task and specificities of the pre-training dataset. Based on the discovery we introduce Pegasus-SP, an improved pre-trained abstractive summarization model that unifies pseudo-summarization with sentence permutation. The new model outperforms the existing counterparts in low-resource settings and demonstrates superior adaptability. Additionally, we show that all pre-trained summarization models benefit from data-wise attention correction, achieving up to 10% relative ROUGE improvement on model-data pairs with the largest distribution discrepancies.

RBPSum: an Extractive Summarization Approach Using Bi-Stream Attention and Position Residual Connection

RetrievalSum: A Retrieval Enhanced Framework for Abstractive Summarization

SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents

Leveraging Salience Analysis and Sparse Attention for Long Document Summarization

T-BERTSum: Topic-Aware Text Summarization Based on BERT

Abstractive Summarization Improved by WordNet-based Extractive Sentences

A study of extractive summarization of long documents incorporating local topic and hierarchical information

Towards a Robust Retrieval-Based Summarization System

Extractive Dialogue Summarization Without Annotation Based on Distantly Supervised Machine Reading Comprehension in Customer Service

Exploring Neural Models for Query-Focused Summarization

Unsupervised Extractive Summarization with Learnable Length Control Strategies

DCDSum: An interpretable extractive summarization framework based on contrastive learning method

What Have We Achieved on Text Summarization?

Deep learning-based extractive text summarization with word-level attention mechanism

Balancing Lexical and Semantic Quality in Abstractive Summarization

Investigating the Pre-Training Bias in Low-Resource Abstractive Summarization

Extractive Summarization via ChatGPT for Faithful Summary Generation

Abstractive text summarization model combining a hierarchical attention mechanism and multiobjective reinforcement learning

A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss

GoSum: Extractive Summarization of Long Documents by Reinforcement Learning and Graph Organized discourse state

Selective and Coverage Multi-head Attention for Abstractive Summarization