Large Language Models are Limited in Out-of-Context Knowledge Reasoning

Peng Hu,Changjiang Gao,Ruiqi Gao,Jiajun Chen,Shujian Huang

2024-09-27

Abstract:Large Language Models (LLMs) possess extensive knowledge and strong capabilities in performing in-context reasoning. However, previous work challenges their out-of-context reasoning ability, i.e., the ability to infer information from their training data, instead of from the context or prompt. This paper focuses on a significant aspect of out-of-context reasoning: Out-of-Context Knowledge Reasoning (OCKR), which is to combine multiple knowledge to infer new knowledge. We designed a synthetic dataset with seven representative OCKR tasks to systematically assess the OCKR capabilities of LLMs. Using this dataset, we evaluated several LLMs and discovered that their proficiency in this aspect is limited, regardless of whether the knowledge is trained in a separate or adjacent training settings. Moreover, training the model to reason with reasoning examples does not result in significant improvement, while training the model to perform explicit knowledge retrieval helps for retrieving attribute knowledge but not the relation knowledge, indicating that the model's limited OCKR capabilities are due to difficulties in knowledge retrieval. Furthermore, we treat cross-lingual knowledge transfer as a distinct form of OCKR, and evaluate this ability. Our results show that the evaluated model also exhibits limited ability in transferring knowledge across languages.

Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that large - language models (LLMs) have limited ability in handling out - of - context knowledge reasoning (OCKR). Specifically, the paper focuses on whether LLMs can recall facts from the training data and use these facts for reasoning at the test time, even if these facts are not strongly directly related to the prompts at the test time. This involves the ability to combine multiple knowledge points to infer new knowledge. The paper systematically evaluates the performance of LLMs in this regard by designing a synthetic dataset containing seven representative OCKR tasks. The study finds that, whether in the individual or adjacent training settings, the proficiency of LLMs in this aspect is limited. In addition, training the model to reason using reasoning examples does not significantly improve performance, while training the model to perform explicit knowledge retrieval helps to retrieve attribute knowledge, but is not very helpful for retrieving relationship knowledge, indicating that the model has difficulties in knowledge retrieval. Furthermore, the paper also evaluates cross - language knowledge transfer as a special form of OCKR, and the results show that the evaluated models also show limited ability in cross - language knowledge transfer.

Large Language Models are Limited in Out-of-Context Knowledge Reasoning

Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning.

LLMs for Relational Reasoning: How Far are We?

Do Large Language Models Understand Logic or Just Mimick Context?

CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge

Large Language Models Are In-Context Semantic Reasoners Rather Than Symbolic Reasoners

Large Language Models Are Cross-Lingual Knowledge-Free Reasoners

Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs

Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding

Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context

CLR-Bench: Evaluating Large Language Models in College-level Reasoning

On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models

Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning

Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

Can Large Language Models Create New Knowledge for Spatial Reasoning Tasks?

Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons

On Exploring the Reasoning Capability of Large Language Models with Knowledge Graphs

Rethinking with Retrieval: Faithful Large Language Model Inference

Large Language Models are In-context Teachers for Knowledge Reasoning

Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus