Abstract:In this paper we present an approach to reduce hallucinations in Large Language Models (LLMs) by incorporating Knowledge Graphs (KGs) as an additional modality. Our method involves transforming input text into a set of KG embeddings and using an adapter to integrate these embeddings into the language model space, without relying on external retrieval processes. To facilitate this, we created WikiEntities, a dataset containing over 3 million Wikipedia texts annotated with entities from Wikidata and their corresponding embeddings from PyTorch-BigGraph. This dataset serves as a valuable resource for training Entity Linking models and adapting the described method to various LLMs using specialized adapters. Our method does not require fine-tuning of the language models themselves; instead, we only train the adapter. This ensures that the model's performance on other tasks is not affected. We trained an adapter for the Mistral 7B, LLaMA 2-7B (chat), and LLaMA 3-8B (instruct) models using this dataset and demonstrated that our approach improves performance on the HaluEval, True-False benchmarks and FEVER dataset. The results indicate that incorporating KGs as a new modality can effectively reduce hallucinations and improve the factual accuracy of language models, all without the need for external retrieval.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of hallucinations in large - language models (LLMs). Hallucination refers to the inclusion of inaccurate or fictional factual information in the text generated by the language model. Although LLMs have made significant progress in training methods and dialogue techniques, they are still prone to hallucinations, which affects the credibility and accuracy of their generated content. Specifically, the author proposes a method to reduce hallucinations by introducing knowledge graphs (KGs) as an additional modality. The core idea of this method is to convert the input text into KG embeddings and use an adapter to integrate these embeddings into the language - model space without relying on an external retrieval process. ### Main contributions 1. **WikiEntities dataset**: The author created a dataset containing more than 3 million Wikipedia articles, each article is annotated with entities from Wikidata and their corresponding embeddings. This dataset can be used to train entity - linking models and can be extended to various large - language models by training specific adapters. 2. **Introduction of KG modality**: The author introduced KG information as an additional modality into models such as Mistral 7B, LLaMA 2 - 7B and LLaMA 3 - 8B, demonstrating that this method can effectively reduce hallucinations and improve factual accuracy without affecting the model's performance on other tasks. ### Method overview - **Text2Graph Mapper**: Map the input text to the KG embedding space. - **Adapter**: Convert KG embeddings into language - model embeddings. - **Special tokens**: Introduce two special tokens, `<GRAPH_START>` and `<GRAPH_END>`, to encapsulate the additional KG modality. In this way, the author not only reduced the hallucination phenomenon of the model, but also improved the model's performance on multiple benchmark tests (such as HaluEval, True - False and FEVER). ### Conclusion Research shows that the introduction of KG modality can effectively reduce the hallucination phenomenon in language models, improve the factual accuracy of generated content, while maintaining the model's performance on other tasks. In addition, the WikiEntities dataset developed by the author provides a valuable resource for entity - linking models and helps to apply the KG integration method to different language models.

Addressing Hallucinations in Language Models with Knowledge Graph Embeddings as an Additional Modality

Knowledge Graphs, Large Language Models, and Hallucinations: An NLP Perspective

Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering

Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval

Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability

CogMG: Collaborative Augmentation Between Large Language Model and Knowledge Graph

GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation Framework

Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-Based Retrofitting

Large Language Models and Knowledge Graphs: Opportunities and Challenges

MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models

Mitigating Hallucinations of Large Language Models via Knowledge Consistent Alignment

Enhancing Large Language Models with Knowledge Graphs for Robust Question Answering

Enhancing Large Language Models with Pseudo- and Multisource- Knowledge Graphs for Open-ended Question Answering

Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models

Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models

Mitigating Hallucinations Using Ensemble of Knowledge Graph and Vector Store in Large Language Models to Enhance Mental Health Support

A Novel Approach to Eliminating Hallucinations in Large Language Model-Assisted Causal Discovery

Combining Knowledge Graphs and Large Language Models

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback