Abstract:Recent advances in low-resource abstractive summarization were largely made through the adoption of specialized pre-training, pseudo-summarization, that integrates the content selection knowledge through various centrality-based sentence recovery tasks. However, despite the substantial results, there are several cases where the predecessor general-purpose pre-trained language model BART outperforms the summarization-specialized counterparts in both few-shot and fine-tuned scenarios. In this work, we investigate these performance irregularities and shed some light on the effect of pseudo-summarization pre-training in low-resource settings. We benchmarked five pre-trained abstractive summarization models on five datasets of diverse domains and analyzed their behavior in terms of extractive intuition and attention patterns. Despite that all models exhibit extractive behavior, some lack the prediction confidence to copy longer text fragments and have a misaligned attention distribution with the structure of the real-world texts. The latter happens to be the major factor of underperformance in fiction, news, and scientific articles domains as the better initial attention alignment of BART leads to the best benchmark results in all few-shot settings. A further examination reveals that BART summarization capabilities are the side-effect of the combination of sentence permutation task and specificities of the pre-training dataset. Based on the discovery we introduce Pegasus-SP, an improved pre-trained abstractive summarization model that unifies pseudo-summarization with sentence permutation. The new model outperforms the existing counterparts in low-resource settings and demonstrates superior adaptability. Additionally, we show that all pre-trained summarization models benefit from data-wise attention correction, achieving up to 10% relative ROUGE improvement on model-data pairs with the largest distribution discrepancies.

Text Summarization with Pretrained Encoders

T-BERTSum: Topic-Aware Text Summarization Based on BERT

Pretraining-Based Natural Language Generation for Text Summarization

Assessment of Transformer-Based Encoder-Decoder Model for Human-Like Summarization

Fine-tune BERT for Extractive Summarization

Efficient Adaptation of Pretrained Transformers for Abstractive Summarization

Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization

Unified extractive-abstractive summarization: a hybrid approach utilizing BERT and transformer models for enhanced document summarization

Abstractive Summarization of Spoken and Written Instructions with BERT

Combining Temporal Event Relations and Pre-Trained Language Models for Text Summarization

Efficient Two-stage Approach for Long Document Summarization

Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization

An Analysis of Abstractive Text Summarization Using Pre-trained Models

Enhancing Semantic Understanding with Self-supervised Methods for Abstractive Dialogue Summarization

Cited text span identification for scientific summarisation using pre-trained encoders

Bipartite Graph Pre-training for Unsupervised Extractive Summarization with Graph Convolutional Auto-Encoders

Abstractive method-based Text Summarization using Bidirectional Long Short-Term Memory and Pointer Generator Mode

Long Document Summarization with Top-down and Bottom-up Inference

Investigating the Pre-Training Bias in Low-Resource Abstractive Summarization

Curriculum-Guided Abstractive Summarization

On Extractive and Abstractive Neural Document Summarization with Transformer Language Models