Revisiting Zero-Shot Abstractive Summarization in the Era of Large Language Models from the Perspective of Position Bias

Anshuman Chhabra,Hadi Askari,Prasant Mohapatra
2024-03-19
Abstract:We characterize and study zero-shot abstractive summarization in Large Language Models (LLMs) by measuring position bias, which we propose as a general formulation of the more restrictive lead bias phenomenon studied previously in the literature. Position bias captures the tendency of a model unfairly prioritizing information from certain parts of the input text over others, leading to undesirable behavior. Through numerous experiments on four diverse real-world datasets, we study position bias in multiple LLM models such as GPT 3.5-Turbo, Llama-2, and Dolly-v2, as well as state-of-the-art pretrained encoder-decoder abstractive summarization models such as Pegasus and BART. Our findings lead to novel insights and discussion on performance and position bias of models for zero-shot summarization tasks.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the issue in large language models (LLMs) where, in the task of zero-shot generative summarization, the model tends to prioritize information from certain parts of the input text while ignoring other parts, known as position bias. Specifically: 1. **Concept and Definition**: The paper generalizes and formalizes the "lead bias" from previous research into a more general "position bias," proposing that position bias refers to the phenomenon where the model unfairly prioritizes information from certain parts of the input text when generating summaries. 2. **Measurement Method**: Position bias is quantified using distribution mapping and the Wasserstein distance, among other metrics, and is proposed as a supplementary indicator for evaluating the quality of zero-shot generative summarization models. 3. **Experimental Validation**: Extensive experiments were conducted on four different datasets (CNN/DM, XSum, Reddit, News Summary) and various models (such as GPT-3.5-Turbo, Llama-2, Dolly-v2, Pegasus, BART) to validate the existence of position bias and its impact on model performance. 4. **Contributions and Findings**: The paper introduces the concept of position bias and its measurement methods, providing new insights that help researchers better select models suitable for specific tasks. Particularly in extreme summarization scenarios (such as XSum), models may exhibit significant position bias. In summary, the paper aims to improve the effectiveness and evaluation standards of zero-shot generative summarization tasks by introducing and studying position bias.