KGLens: Towards Efficient and Effective Knowledge Probing of Large Language Models with Knowledge Graphs

Shangshang Zheng,He Bai,Yizhe Zhang,Yi Su,Xiaochuan Niu,Navdeep Jaitly
2024-08-01
Abstract:Large Language Models (LLMs) might hallucinate facts, while curated Knowledge Graph (KGs) are typically factually reliable especially with domain-specific knowledge. Measuring the alignment between KGs and LLMs can effectively probe the factualness and identify the knowledge blind spots of LLMs. However, verifying the LLMs over extensive KGs can be expensive. In this paper, we present KGLens, a Thompson-sampling-inspired framework aimed at effectively and efficiently measuring the alignment between KGs and LLMs. KGLens features a graph-guided question generator for converting KGs into natural language, along with a carefully designed importance sampling strategy based on parameterized KG structure to expedite KG traversal. Our simulation experiment compares the brute force method with KGLens under six different sampling methods, demonstrating that our approach achieves superior probing efficiency. Leveraging KGLens, we conducted in-depth analyses of the factual accuracy of ten LLMs across three large domain-specific KGs from Wikidata, composing over 19K edges, 700 relations, and 21K entities. Human evaluation results indicate that KGLens can assess LLMs with a level of accuracy nearly equivalent to that of human annotators, achieving 95.7% of the accuracy rate.
Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the issue of factual accuracy in large language models (LLMs), particularly focusing on factual errors, hallucinations, or outdated information that LLMs may exhibit. To tackle these issues, the paper proposes a new framework called KGLens, which leverages knowledge graphs (KGs) to efficiently and effectively assess the knowledge alignment of LLMs and identify knowledge blind spots in LLMs. Specifically, the paper focuses on the following points: 1. **Challenges and Limitations**: Existing fact-checking and fact-based question-answering methods have limitations, including difficulties in excluding test data to ensure fair evaluation and handling different expressions of the same fact by LLMs. 2. **Solution**: By converting knowledge graphs into natural language questions to evaluate LLMs, this method can overcome the aforementioned challenges. However, directly generating questions from knowledge graphs also faces efficiency and ambiguity issues. 3. **KGLens Framework**: This framework combines graph-based knowledge-guided question generation techniques with a Thompson sampling-inspired method to efficiently select and evaluate knowledge edges in LLMs. It also proposes a parameterized knowledge graph (PKG) to estimate LLMs' deficiencies in specific knowledge areas. 4. **Experiments and Evaluation**: The paper presents the results of evaluating various LLMs, including multiple versions of GPT-3.5 and GPT-4, on knowledge graphs from three different domains. Evaluation metrics include zero-awareness rate, full-awareness rate, and win rate, which help identify the reliability and knowledge gaps of LLMs. 5. **Contributions**: The main contributions of the paper include proposing a novel and efficient knowledge probing framework, a strategy to reduce question ambiguity, and validating the framework's effectiveness through human evaluation. In summary, the paper aims to improve the efficiency and accuracy of knowledge assessment for LLMs through the KGLens framework, thereby promoting the development of more reliable and truthful AI systems.