Abstract:Large Language Models (LLMs) might hallucinate facts, while curated Knowledge Graph (KGs) are typically factually reliable especially with domain-specific knowledge. Measuring the alignment between KGs and LLMs can effectively probe the factualness and identify the knowledge blind spots of LLMs. However, verifying the LLMs over extensive KGs can be expensive. In this paper, we present KGLens, a Thompson-sampling-inspired framework aimed at effectively and efficiently measuring the alignment between KGs and LLMs. KGLens features a graph-guided question generator for converting KGs into natural language, along with a carefully designed importance sampling strategy based on parameterized KG structure to expedite KG traversal. Our simulation experiment compares the brute force method with KGLens under six different sampling methods, demonstrating that our approach achieves superior probing efficiency. Leveraging KGLens, we conducted in-depth analyses of the factual accuracy of ten LLMs across three large domain-specific KGs from Wikidata, composing over 19K edges, 700 relations, and 21K entities. Human evaluation results indicate that KGLens can assess LLMs with a level of accuracy nearly equivalent to that of human annotators, achieving 95.7% of the accuracy rate.

What problem does this paper attempt to address?

The paper aims to address the issue of factual accuracy in large language models (LLMs), particularly focusing on factual errors, hallucinations, or outdated information that LLMs may exhibit. To tackle these issues, the paper proposes a new framework called KGLens, which leverages knowledge graphs (KGs) to efficiently and effectively assess the knowledge alignment of LLMs and identify knowledge blind spots in LLMs. Specifically, the paper focuses on the following points: 1. **Challenges and Limitations**: Existing fact-checking and fact-based question-answering methods have limitations, including difficulties in excluding test data to ensure fair evaluation and handling different expressions of the same fact by LLMs. 2. **Solution**: By converting knowledge graphs into natural language questions to evaluate LLMs, this method can overcome the aforementioned challenges. However, directly generating questions from knowledge graphs also faces efficiency and ambiguity issues. 3. **KGLens Framework**: This framework combines graph-based knowledge-guided question generation techniques with a Thompson sampling-inspired method to efficiently select and evaluate knowledge edges in LLMs. It also proposes a parameterized knowledge graph (PKG) to estimate LLMs' deficiencies in specific knowledge areas. 4. **Experiments and Evaluation**: The paper presents the results of evaluating various LLMs, including multiple versions of GPT-3.5 and GPT-4, on knowledge graphs from three different domains. Evaluation metrics include zero-awareness rate, full-awareness rate, and win rate, which help identify the reliability and knowledge gaps of LLMs. 5. **Contributions**: The main contributions of the paper include proposing a novel and efficient knowledge probing framework, a strategy to reduce question ambiguity, and validating the framework's effectiveness through human evaluation. In summary, the paper aims to improve the efficiency and accuracy of knowledge assessment for LLMs through the KGLens framework, thereby promoting the development of more reliable and truthful AI systems.

KGLens: Towards Efficient and Effective Knowledge Probing of Large Language Models with Knowledge Graphs

Knowledge Graph-Enhanced Large Language Models via Path Selection

Large Language Models Can Better Understand Knowledge Graphs Than We Thought

Efficient Knowledge Infusion via KG-LLM Alignment

Systematic Assessment of Factual Knowledge in Large Language Models

Give Us the Facts: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling

Fact Finder -- Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs

ChatGPT is not Enough: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling

Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study over Open-ended Question Answering

Enhancing Knowledge Graph Consistency through Open Large Language Models: A Case Study

Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts

KG-FPQ: Evaluating Factuality Hallucination in LLMs with Knowledge Graph-based False Premise Questions

Clue-Guided Path Exploration: Optimizing Knowledge Graph Retrieval with Large Language Models to Address the Information Black Box Challenge

Evaluating the Factuality of Large Language Models using Large-Scale Knowledge Graphs

Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from Knowledge Graphs

Making Large Language Models Perform Better in Knowledge Graph Completion

Synergizing Knowledge Graphs with Large Language Models: A Comprehensive Review and Future Prospects

Exploring Large Language Models for Knowledge Graph Completion

Knowledge Graph Large Language Model (KG-LLM) for Link Prediction