Estimating Knowledge in Large Language Models Without Generating a Single Token

Daniela Gottesman,Mor Geva
2024-10-29
Abstract:To evaluate knowledge in large language models (LLMs), current methods query the model and then evaluate its generated responses. In this work, we ask whether evaluation can be done before the model has generated any text. Concretely, is it possible to estimate how knowledgeable a model is about a certain entity, only from its internal computation? We study this question with two tasks: given a subject entity, the goal is to predict (a) the ability of the model to answer common questions about the entity, and (b) the factuality of open-ended responses generated by the model about the entity. Experiments with a variety of LLMs show that KEEN, a simple probe trained over internal subject representations, succeeds at both tasks - correlating with both the QA accuracy of the model per-subject and FActScore, a recent factuality metric in open-ended generation. Moreover, KEEN naturally aligns with the model's hedging behavior and faithfully reflects changes in the model's knowledge after fine-tuning. Lastly, we show a more interpretable yet equally performant variant of KEEN, which highlights a small set of tokens indicative of clusters and gaps in the model's knowledge. Being simple and lightweight, KEEN can be leveraged to guide decisions such as when it is appropriate to apply further training or augment queries with retrieval.
Computation and Language
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to explore how to assess the knowledge level of large language models (LLMs) before they generate any text. Specifically, the authors propose a method to estimate the model's knowledge of specific entities by analyzing its internal representations. The paper mainly investigates two tasks: 1. **Predicting the model's ability to answer common questions about entities**: Given a subject entity (e.g., Napoleon or the Empire State Building), the goal is to predict the number of questions related to that entity that the model can answer correctly. 2. **Predicting the factual accuracy of the model's open-ended responses**: Given a general information query about a subject (e.g., "Tell me facts about Napoleon" or "Generate a paragraph about Napoleon"), the goal is to predict the proportion of factually correct statements in the model's generated response. ### Main Contributions - **KEEN Probe**: The authors propose a simple probe (KEEN, i.e., Knowledge Estimation for Entities) to estimate the model's knowledge level of specific entities by training on the model's internal representations. The KEEN probe can predict the model's accuracy in question-answering tasks and the factual accuracy in open-ended generation tasks. - **Experimental Validation**: Through experiments on different sizes and types of LLMs, the KEEN probe demonstrates a strong correlation with question-answering accuracy and factual accuracy across multiple models. - **Interpretability**: The KEEN probe not only performs well but also has high interpretability, identifying key terms that influence the model's knowledge level. ### Experimental Setup - **Dataset**: For the question-answering task, 3,472 subject entities were extracted from the PopQA dataset, generating an average of 5.3 questions per subject. For the open-ended generation task, the FActScore dataset was used, which includes model-generated biographies, extracted statements, and their correctness labels. - **Baseline Methods**: The authors compared the KEEN probe with baseline methods based on intrinsic features (such as fully connected activations and self-attention outputs) and extrinsic features (such as entity popularity). ### Results - **Correlation**: The KEEN probe outperforms baseline methods across all models, with correlations of 0.60-0.68 with question-answering accuracy and 0.66-0.77 with factual accuracy. - **Generalization Ability**: The KEEN probe not only excels in question-answering tasks but also predicts factual accuracy in open-ended generation tasks, demonstrating its generalization ability across different tasks. - **Model Hedging Behavior**: The KEEN probe's scores are positively correlated with the model's hedging behavior (i.e., generating responses like "I don't know" when uncertain), indicating that the model may hedge based on features of its internal representations. - **Reflection of Knowledge Changes**: By fine-tuning the model, the KEEN probe can reflect changes in the model's knowledge, particularly showing increased scores for target entities after fine-tuning, while non-target entities' scores remain relatively stable. ### Conclusion The KEEN probe is a simple and lightweight method to quantify a model's knowledge level of specific entities through its internal representations. This method not only excels in assessing the model's knowledge level but also has high interpretability and generalization ability. The KEEN probe can guide developers' decisions, such as whether further training or query enhancement is needed.