Unveiling LLMs: The Evolution of Latent Representations in a Dynamic Knowledge Graph

Marco Bronzini,Carlo Nicolini,Bruno Lepri,Jacopo Staiano,Andrea Passerini
2024-08-06
Abstract:Large Language Models (LLMs) demonstrate an impressive capacity to recall a vast range of factual knowledge. However, understanding their underlying reasoning and internal mechanisms in exploiting this knowledge remains a key research area. This work unveils the factual information an LLM represents internally for sentence-level claim verification. We propose an end-to-end framework to decode factual knowledge embedded in token representations from a vector space to a set of ground predicates, showing its layer-wise evolution using a dynamic knowledge graph. Our framework employs activation patching, a vector-level technique that alters a token representation during inference, to extract encoded knowledge. Accordingly, we neither rely on training nor external models. Using factual and common-sense claims from two claim verification datasets, we showcase interpretability analyses at local and global levels. The local analysis highlights entity centrality in LLM reasoning, from claim-related information and multi-hop reasoning to representation errors causing erroneous evaluation. On the other hand, the global reveals trends in the underlying evolution, such as word-based knowledge evolving into claim-related facts. By interpreting semantics from LLM latent representations and enabling graph-related analyses, this work enhances the understanding of the factual knowledge resolution process.
Computation and Language,Artificial Intelligence,Computers and Society
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to understand how large language models (LLMs) utilize their internally - stored factual knowledge and their reasoning mechanisms when evaluating the authenticity of input sentences. Specifically, the researchers focus on the following three research questions: 1. **What factual knowledge do LLMs use to evaluate the authenticity of the input?** The paper reveals the factual information inside LLMs used to evaluate the authenticity of short sentences by proposing a framework to decode the factual knowledge embedded in the latent representations of LLMs. 2. **How does this knowledge evolve among different layers of the model?** Through dynamic knowledge graphs, the researchers show the evolution process of factual knowledge among different hidden layers of LLMs, which helps to understand how knowledge gradually evolves from word - level information to sentence - related facts. 3. **Are there obvious patterns in its evolution process?** Through local and global interpretability analyses, the researchers reveal some patterns in LLMs when processing factual knowledge, such as the importance of intermediate layers and how word - level information evolves into sentence - related facts. To achieve the above goals, the paper proposes an end - to - end framework that can: - **Decode the semantics embedded in the latent space of LLMs**, represented in primitive form without relying on external models or training processes. - **Expand the single - token patching technique**, using the additive nature of LLMs' token representations to jointly probe the semantics of multiple tokens. - **Use graph representations to represent the encoded factual knowledge and track its underlying evolution**. - **Support global and local interpretability analyses**, revealing how word - level knowledge evolves into sentence - related facts and the representation errors that lead to mis - evaluations. Through these methods, the paper not only improves the understanding of the factual knowledge parsing process of LLMs but also enhances the mechanism interpretability of language models.