Exploring the Limits of ChatGPT for Query or Aspect-based Text Summarization

Xianjun Yang,Yan Li,Xinlu Zhang,Haifeng Chen,Wei Cheng

2023-02-16

Abstract:Text summarization has been a crucial problem in natural language processing (NLP) for several decades. It aims to condense lengthy documents into shorter versions while retaining the most critical information. Various methods have been proposed for text summarization, including extractive and abstractive summarization. The emergence of large language models (LLMs) like GPT3 and ChatGPT has recently created significant interest in using these models for text summarization tasks. Recent studies \cite{goyal2022news, zhang2023benchmarking} have shown that LLMs-generated news summaries are already on par with humans. However, the performance of LLMs for more practical applications like aspect or query-based summaries is underexplored. To fill this gap, we conducted an evaluation of ChatGPT's performance on four widely used benchmark datasets, encompassing diverse summaries from Reddit posts, news articles, dialogue meetings, and stories. Our experiments reveal that ChatGPT's performance is comparable to traditional fine-tuning methods in terms of Rouge scores. Moreover, we highlight some unique differences between ChatGPT-generated summaries and human references, providing valuable insights into the superpower of ChatGPT for diverse text summarization tasks. Our findings call for new directions in this area, and we plan to conduct further research to systematically examine the characteristics of ChatGPT-generated summaries through extensive human evaluation.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The paper attempts to address the problem of generating text summaries based on specific aspects or queries. Although large language models (such as GPT-3 and ChatGPT) have achieved human-comparable performance on general tasks like news summarization, their performance in more practical applications, such as aspect-based or query-based summarization, has not been fully explored. Specifically, the paper aims to evaluate ChatGPT's performance in the following areas: 1. **Diverse datasets**: The paper selects four widely used benchmark datasets, covering different types of summaries from Reddit posts, news articles, dialogue meetings, and stories. 2. **Comparison with traditional methods**: The performance of summaries generated by ChatGPT is compared with those generated by traditional fine-tuning methods using Rouge scores. 3. **Unique differences**: The unique differences between ChatGPT-generated summaries and human reference summaries are analyzed to provide valuable insights. The main contributions of the paper include: - Systematically extending the application of large language models beyond general summarization, particularly in aspect-based or query-based summarization. - Demonstrating that the diverse aspect-specific summaries generated by ChatGPT are highly comparable to traditional fine-tuning methods in terms of Rouge scores. - Proposing several potential directions for future research through in-depth analysis of the generated summaries to fully leverage the advantages of large language models. Overall, the paper aims to fill the gap in existing research by systematically evaluating ChatGPT's performance in diverse text summarization tasks and providing new perspectives for its further application in the field of natural language processing.

Exploring the Limits of ChatGPT for Query or Aspect-based Text Summarization

Human-like Summarization Evaluation with ChatGPT

Comparing Abstractive Summaries Generated by ChatGPT to Real Summaries Through Blinded Reviewers and Text Classification Algorithms

Extractive Summarization via ChatGPT for Faithful Summary Generation

Hybrid Long Document Summarization using C2F-FAR and ChatGPT: A Practical Study

Automatic Code Summarization via ChatGPT: How Far Are We?

ChatGPT Application In Summarizing An Evolution Of Deep Learning Techniques In Imaging: A Qualitative Study

ChatGPT vs Human-authored Text: Insights into Controllable Text Summarization and Sentence Style Transfer

Evaluation on ChatGPT for Chinese Language Understanding

Text Summarization Using Large Language Models: A Comparative Study of MPT-7b-instruct, Falcon-7b-instruct, and OpenAI Chat-GPT Models

Summary of ChatGPT-Related Research and Perspective Towards the Future of Large Language Models

Exploring the potential of ChatGPT in medical dialogue summarization: a study on consistency with human preferences

A Survey on the Real Power of ChatGPT

A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets

Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT

Is ChatGPT a Good NLG Evaluator? A Preliminary Study

An Extensive Benchmark Study on Biomedical Text Generation and Mining with ChatGPT

Text summarization with ChatGPT for drug labeling documents

A Preliminary Study of ChatGPT on News Recommendation: Personalization, Provider Fairness, Fake News

ChatGPT Performance Evaluation on Chinese Language and Risk Measures