Utilizing GPT to Enhance Text Summarization: A Strategy to Minimize Hallucinations

Hassan Shakil,Zeydy Ortiz,Grant C. Forbes

2024-05-07

Abstract:In this research, we uses the DistilBERT model to generate extractive summary and the T5 model to generate abstractive summaries. Also, we generate hybrid summaries by combining both DistilBERT and T5 models. Central to our research is the implementation of GPT-based refining process to minimize the common problem of hallucinations that happens in AI-generated summaries. We evaluate unrefined summaries and, after refining, we also assess refined summaries using a range of traditional and novel metrics, demonstrating marked improvements in the accuracy and reliability of the summaries. Results highlight significant improvements in reducing hallucinatory content, thereby increasing the factual integrity of the summaries.

Computation and Language,Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to reduce the "hallucination" phenomenon during text summarization generation, that is, the content in the summaries generated by AI that does not match the source text. The author uses the DistilBERT model to generate extractive summaries, the T5 model to generate abstractive summaries, and combines these two methods to generate hybrid summaries. To further reduce the hallucination phenomenon, the author introduces a refinement process based on GPT. Through this process, the generated summaries are evaluated and optimized to improve the accuracy and reliability of the summaries. Specifically, the paper mainly focuses on the following aspects: 1. **Generating unrefined summaries**: Use DistilBERT to generate extractive summaries, use T5 to generate abstractive summaries, and combine the two to generate hybrid summaries. 2. **GPT - based refinement process**: Evaluate and refine the generated summaries through the GPT model to reduce hallucinatory content. 3. **Evaluation metrics**: Use a variety of traditional and new evaluation metrics (such as FactSumm, QAGS, SummaC, ROUGE, and GPT 3.5 Turbo) to evaluate the quality of unrefined and refined summaries. 4. **Statistical analysis**: Verify the effectiveness of the refinement process through statistical methods such as paired t - tests. The goal of the paper is to significantly reduce the hallucination phenomenon in summaries and improve the accuracy and reliability of summaries through these methods.

Utilizing GPT to Enhance Text Summarization: A Strategy to Minimize Hallucinations

Improving Faithfulness in Abstractive Summarization with Contrast Candidate Generation and Selection

Reducing Quantity Hallucinations in Abstractive Summarization

Evaluating Text Summaries Generated by Large Language Models Using OpenAI's GPT

A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models

Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic Papers

Correction with Backtracking Reduces Hallucination in Summarization

Don't Believe Everything You Read: Enhancing Summarization Interpretability through Automatic Identification of Hallucinations in Large Language Models

Tackling Hallucinations in Neural Chart Summarization

Metric Ensembles For Hallucination Detection

Training Dynamics for Text Summarization Models

FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs

TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization

Abstract Summarization Model Based on Semantic Graphs and Entity Pointers

Extractive Summarization via ChatGPT for Faithful Summary Generation

Characterizing Multimodal Long-form Summarization: A Case Study on Financial Reports

Distilled GPT for source code summarization

Detecting and Mitigating Hallucinations in Multilingual Summarisation

Assessment of Transformer-Based Encoder-Decoder Model for Human-Like Summarization

Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization