Text Summarization Using Large Language Models: A Comparative Study of MPT-7b-instruct, Falcon-7b-instruct, and OpenAI Chat-GPT Models

Lochan Basyal,Mihir Sanghvi

2023-10-18

Abstract:Text summarization is a critical Natural Language Processing (NLP) task with applications ranging from information retrieval to content generation. Leveraging Large Language Models (LLMs) has shown remarkable promise in enhancing summarization techniques. This paper embarks on an exploration of text summarization with a diverse set of LLMs, including MPT-7b-instruct, falcon-7b-instruct, and OpenAI ChatGPT text-davinci-003 models. The experiment was performed with different hyperparameters and evaluated the generated summaries using widely accepted metrics such as the Bilingual Evaluation Understudy (BLEU) Score, Recall-Oriented Understudy for Gisting Evaluation (ROUGE) Score, and Bidirectional Encoder Representations from Transformers (BERT) Score. According to the experiment, text-davinci-003 outperformed the others. This investigation involved two distinct datasets: CNN Daily Mail and XSum. Its primary objective was to provide a comprehensive understanding of the performance of Large Language Models (LLMs) when applied to different datasets. The assessment of these models' effectiveness contributes valuable insights to researchers and practitioners within the NLP domain. This work serves as a resource for those interested in harnessing the potential of LLMs for text summarization and lays the foundation for the development of advanced Generative AI applications aimed at addressing a wide spectrum of business challenges.

Computation and Language,Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

The main objective of this paper is to explore the performance of different large language models (LLMs) in the task of text summarization. Specifically, the paper evaluates the performance of the MPT-7b-instruct, Falcon-7b-instruct, and OpenAI's ChatGPT (text-davinci-003) models in text summarization through comparative experiments. The study uses two different datasets—CNN/Daily Mail and XSum—and employs a series of standard evaluation metrics (such as BLEU Score, ROUGE Score, and BERT Score) to measure the quality of the generated summaries. The core purpose of the research is to provide researchers and practitioners in the field of Natural Language Processing (NLP) with a comprehensive understanding of how these large language models perform in text summarization across different datasets. Additionally, this work lays the foundation for developing advanced generative AI applications using LLMs to address various commercial challenges. Through comparative experimental results, the paper finds that OpenAI's text-davinci-003 model performs excellently on multiple metrics, outperforming the other models.

Text Summarization Using Large Language Models: A Comparative Study of MPT-7b-instruct, Falcon-7b-instruct, and OpenAI Chat-GPT Models

Benchmarking Large Language Models for News Summarization

Comparing Abstractive Summaries Generated by ChatGPT to Real Summaries Through Blinded Reviewers and Text Classification Algorithms

A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models

Large Language Models are Not Yet Human-Level Evaluators for Abstractive Summarization

LaMSUM: Creating Extractive Summaries of User Generated Content using LLMs

On Learning to Summarize with Large Language Models as References

Exploring the Limits of ChatGPT for Query or Aspect-based Text Summarization

Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization

Evaluating Text Summaries Generated by Large Language Models Using OpenAI's GPT

Reading Subtext: Evaluating Large Language Models on Short Story Summarization with Writers

TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale

Hybrid Long Document Summarization using C2F-FAR and ChatGPT: A Practical Study

Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data

Assessment of Transformer-Based Encoder-Decoder Model for Human-Like Summarization

Building Real-World Meeting Summarization Systems using Large Language Models: A Practical Perspective

Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method

An End-to-End Speech Summarization Using Large Language Model

Assessing LLMs for Zero-shot Abstractive Summarization Through the Lens of Relevance Paraphrasing

A Comparative Study of Quality Evaluation Methods for Text Summarization

Characterizing Multimodal Long-form Summarization: A Case Study on Financial Reports