Abstract:To evaluate knowledge in large language models (LLMs), current methods query the model and then evaluate its generated responses. In this work, we ask whether evaluation can be done before the model has generated any text. Concretely, is it possible to estimate how knowledgeable a model is about a certain entity, only from its internal computation? We study this question with two tasks: given a subject entity, the goal is to predict (a) the ability of the model to answer common questions about the entity, and (b) the factuality of open-ended responses generated by the model about the entity. Experiments with a variety of LLMs show that KEEN, a simple probe trained over internal subject representations, succeeds at both tasks - correlating with both the QA accuracy of the model per-subject and FActScore, a recent factuality metric in open-ended generation. Moreover, KEEN naturally aligns with the model's hedging behavior and faithfully reflects changes in the model's knowledge after fine-tuning. Lastly, we show a more interpretable yet equally performant variant of KEEN, which highlights a small set of tokens indicative of clusters and gaps in the model's knowledge. Being simple and lightweight, KEEN can be leveraged to guide decisions such as when it is appropriate to apply further training or augment queries with retrieval.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to explore how to assess the knowledge level of large language models (LLMs) before they generate any text. Specifically, the authors propose a method to estimate the model's knowledge of specific entities by analyzing its internal representations. The paper mainly investigates two tasks: 1. **Predicting the model's ability to answer common questions about entities**: Given a subject entity (e.g., Napoleon or the Empire State Building), the goal is to predict the number of questions related to that entity that the model can answer correctly. 2. **Predicting the factual accuracy of the model's open-ended responses**: Given a general information query about a subject (e.g., "Tell me facts about Napoleon" or "Generate a paragraph about Napoleon"), the goal is to predict the proportion of factually correct statements in the model's generated response. ### Main Contributions - **KEEN Probe**: The authors propose a simple probe (KEEN, i.e., Knowledge Estimation for Entities) to estimate the model's knowledge level of specific entities by training on the model's internal representations. The KEEN probe can predict the model's accuracy in question-answering tasks and the factual accuracy in open-ended generation tasks. - **Experimental Validation**: Through experiments on different sizes and types of LLMs, the KEEN probe demonstrates a strong correlation with question-answering accuracy and factual accuracy across multiple models. - **Interpretability**: The KEEN probe not only performs well but also has high interpretability, identifying key terms that influence the model's knowledge level. ### Experimental Setup - **Dataset**: For the question-answering task, 3,472 subject entities were extracted from the PopQA dataset, generating an average of 5.3 questions per subject. For the open-ended generation task, the FActScore dataset was used, which includes model-generated biographies, extracted statements, and their correctness labels. - **Baseline Methods**: The authors compared the KEEN probe with baseline methods based on intrinsic features (such as fully connected activations and self-attention outputs) and extrinsic features (such as entity popularity). ### Results - **Correlation**: The KEEN probe outperforms baseline methods across all models, with correlations of 0.60-0.68 with question-answering accuracy and 0.66-0.77 with factual accuracy. - **Generalization Ability**: The KEEN probe not only excels in question-answering tasks but also predicts factual accuracy in open-ended generation tasks, demonstrating its generalization ability across different tasks. - **Model Hedging Behavior**: The KEEN probe's scores are positively correlated with the model's hedging behavior (i.e., generating responses like "I don't know" when uncertain), indicating that the model may hedge based on features of its internal representations. - **Reflection of Knowledge Changes**: By fine-tuning the model, the KEEN probe can reflect changes in the model's knowledge, particularly showing increased scores for target entities after fine-tuning, while non-target entities' scores remain relatively stable. ### Conclusion The KEEN probe is a simple and lightweight method to quantify a model's knowledge level of specific entities through its internal representations. This method not only excels in assessing the model's knowledge level but also has high interpretability and generalization ability. The KEEN probe can guide developers' decisions, such as whether further training or query enhancement is needed.

Estimating Knowledge in Large Language Models Without Generating a Single Token

Statistical Knowledge Assessment for Large Language Models

Towards Reliable Latent Knowledge Estimation in LLMs: Zero-Prompt Many-Shot Based Factual Knowledge Extraction

Systematic Assessment of Factual Knowledge in Large Language Models

Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts

Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators

Are LLMs Really Not Knowledgable? Mining the Submerged Knowledge in LLMs' Memory

Assessing the Reliability of Large Language Model Knowledge

Measuring and Modifying Factual Knowledge in Large Language Models

KGLens: Towards Efficient and Effective Knowledge Probing of Large Language Models with Knowledge Graphs

KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models

Large Language Models Can Better Understand Knowledge Graphs Than We Thought

Do Large Language Models Know What They Don't Know?

Can Language Models Act as Knowledge Bases at Scale?

Perception of Knowledge Boundary for Large Language Models through Semi-open-ended Question Answering

DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text

Probing Language Models on Their Knowledge Source

Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation

Physics of Language Models: Part 3.1, Knowledge Storage and Extraction

Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs?

Language Models Benefit from Preparation with Elicited Knowledge