Abstract:Hallucination is a key roadblock for applications of Large Language Models (LLMs), particularly for enterprise applications that are sensitive to information accuracy. To address this issue, two general approaches have been explored: Retrieval-Augmented Generation (RAG) to supply LLMs with updated information as context, and fine-tuning the LLMs with new information and desired output styles. In this paper, we propose Honest AI: a novel strategy to fine-tune "small" language models to say "I don't know" to reduce hallucination, along with several alternative RAG approaches. The solution ranked 1st in Task 2 for the false premise question. The alternative approaches include using RAG with search engine and knowledge graph results, fine-tuning base LLMs with new information and combinations of both approaches. Although all approaches improve the performance of the LLMs, RAG alone does not significantly improve the performance and fine-tuning is needed for better results. Finally, the hybrid approach achieved the highest score in the CRAG benchmark. In addition, our approach emphasizes the use of relatively small models with fewer than 10 billion parameters, promoting resource efficiency.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper is mainly dedicated to solving the **hallucination problem** in large language models (LLMs), especially in scenarios in enterprise applications that are sensitive to information accuracy. Specifically: 1. **The impact of the hallucination problem**: - Hallucination refers to the fact that large language models may generate content that does not conform to the facts or is completely fictional when generating answers. This is a key obstacle in application scenarios that require high accuracy, such as question - answering systems in finance, medical and other fields. 2. **The shortcomings of existing methods**: - At present, there are two main methods to alleviate the hallucination problem: - **Retrieval - Augmented Generation (RAG)**: Provide the latest context information by retrieving external data sources. - **Fine - tuning**: Fine - tune the large language model with new information and the desired output style. - However, using RAG alone is not sufficient to significantly improve performance, and it still needs to be combined with fine - tuning to achieve better results. 3. **The proposed solution**: - **Honest AI strategy**: The author proposes a novel strategy, that is, by fine - tuning "small" language models (with the number of parameters less than 10 billion), making them learn to say "I don't know" when they are uncertain, thereby reducing the hallucination phenomenon. - **Hybrid method**: Combine the methods of RAG and fine - tuning to further improve the performance of the model. The experimental results show that this hybrid method has obtained the highest score in the CRAG benchmark test. 4. **Experimental verification**: - The author conducted experiments in the 2024 Meta KDD Cup competition, and especially performed excellently in dealing with false premise questions, winning the first place in Task 2. Through these methods, the author aims to improve the reliability and accuracy of large language models in practical applications, especially when dealing with complex and changeable problems.

Honest AI: Fine-Tuning "Small" Language Models to Say "I Don't Know", and Reducing Hallucination in RAG

Minimizing Factual Inconsistency and Hallucination in Large Language Models

Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases

RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models

Beyond Fine-Tuning: Effective Strategies for Mitigating Hallucinations in Large Language Models for Data Analytics

Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories

Mitigating Large Language Model Hallucination with Faithful Finetuning

Navigating Uncertainty: Optimizing API Dependency for Hallucination Reduction in Closed-Book Question Answering

A Debate-Driven Experiment on LLM Hallucinations and Accuracy

Towards Mitigating Hallucination in Large Language Models via Self-Reflection

Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models

Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning

Is There No Such Thing as a Bad Question? H4R: HalluciBot For Ratiocination, Rewriting, Ranking, and Routing

Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models

AssistRAG: Boosting the Potential of Large Language Models with an Intelligent Information Assistant

Cost-Effective Hallucination Detection for LLMs

Fine-grained Hallucination Detection and Editing for Language Models

Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation

Teaching Language Models to Hallucinate Less with Synthetic Tasks

Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused