Abstract:We characterize and study zero-shot abstractive summarization in Large Language Models (LLMs) by measuring position bias, which we propose as a general formulation of the more restrictive lead bias phenomenon studied previously in the literature. Position bias captures the tendency of a model unfairly prioritizing information from certain parts of the input text over others, leading to undesirable behavior. Through numerous experiments on four diverse real-world datasets, we study position bias in multiple LLM models such as GPT 3.5-Turbo, Llama-2, and Dolly-v2, as well as state-of-the-art pretrained encoder-decoder abstractive summarization models such as Pegasus and BART. Our findings lead to novel insights and discussion on performance and position bias of models for zero-shot summarization tasks.

What problem does this paper attempt to address?

The paper attempts to address the issue in large language models (LLMs) where, in the task of zero-shot generative summarization, the model tends to prioritize information from certain parts of the input text while ignoring other parts, known as position bias. Specifically: 1. **Concept and Definition**: The paper generalizes and formalizes the "lead bias" from previous research into a more general "position bias," proposing that position bias refers to the phenomenon where the model unfairly prioritizes information from certain parts of the input text when generating summaries. 2. **Measurement Method**: Position bias is quantified using distribution mapping and the Wasserstein distance, among other metrics, and is proposed as a supplementary indicator for evaluating the quality of zero-shot generative summarization models. 3. **Experimental Validation**: Extensive experiments were conducted on four different datasets (CNN/DM, XSum, Reddit, News Summary) and various models (such as GPT-3.5-Turbo, Llama-2, Dolly-v2, Pegasus, BART) to validate the existence of position bias and its impact on model performance. 4. **Contributions and Findings**: The paper introduces the concept of position bias and its measurement methods, providing new insights that help researchers better select models suitable for specific tasks. Particularly in extreme summarization scenarios (such as XSum), models may exhibit significant position bias. In summary, the paper aims to improve the effectiveness and evaluation standards of zero-shot generative summarization tasks by introducing and studying position bias.

Revisiting Zero-Shot Abstractive Summarization in the Era of Large Language Models from the Perspective of Position Bias

Assessing LLMs for Zero-shot Abstractive Summarization Through the Lens of Relevance Paraphrasing

Zero-Shot Strategies for Length-Controllable Summarization

On Positional Bias of Faithfulness for Long-form Summarization

Benchmarking Large Language Models for News Summarization

Leveraging Lead Bias for Zero-shot Abstractive News Summarization

Bias in News Summarization: Measures, Pitfalls and Corpora

Large Language Models are Inconsistent and Biased Evaluators

On Context Utilization in Summarization with Large Language Models

Zero-Shot Cross-Lingual Summarization via Large Language Models

Eliminating Position Bias of Language Models: A Mechanistic Approach

Evaluating Zero-Shot Multilingual Aspect-Based Sentiment Analysis with Large Language Models

Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains

On Learning to Summarize with Large Language Models as References

Earlier Isn't Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization

Attacks against Abstractive Text Summarization Models through Lead Bias and Influence Functions

Balancing Lexical and Semantic Quality in Abstractive Summarization

Large Language Models are Not Yet Human-Level Evaluators for Abstractive Summarization

Language-Independent Representations Improve Zero-Shot Summarization

Revisiting Large Language Models as Zero-shot Relation Extractors

Mitigate Position Bias in Large Language Models via Scaling a Single Dimension