Large Language Models are Limited in Out-of-Context Knowledge Reasoning

Peng Hu,Changjiang Gao,Ruiqi Gao,Jiajun Chen,Shujian Huang
2024-09-27
Abstract:Large Language Models (LLMs) possess extensive knowledge and strong capabilities in performing in-context reasoning. However, previous work challenges their out-of-context reasoning ability, i.e., the ability to infer information from their training data, instead of from the context or prompt. This paper focuses on a significant aspect of out-of-context reasoning: Out-of-Context Knowledge Reasoning (OCKR), which is to combine multiple knowledge to infer new knowledge. We designed a synthetic dataset with seven representative OCKR tasks to systematically assess the OCKR capabilities of LLMs. Using this dataset, we evaluated several LLMs and discovered that their proficiency in this aspect is limited, regardless of whether the knowledge is trained in a separate or adjacent training settings. Moreover, training the model to reason with reasoning examples does not result in significant improvement, while training the model to perform explicit knowledge retrieval helps for retrieving attribute knowledge but not the relation knowledge, indicating that the model's limited OCKR capabilities are due to difficulties in knowledge retrieval. Furthermore, we treat cross-lingual knowledge transfer as a distinct form of OCKR, and evaluate this ability. Our results show that the evaluated model also exhibits limited ability in transferring knowledge across languages.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that large - language models (LLMs) have limited ability in handling out - of - context knowledge reasoning (OCKR). Specifically, the paper focuses on whether LLMs can recall facts from the training data and use these facts for reasoning at the test time, even if these facts are not strongly directly related to the prompts at the test time. This involves the ability to combine multiple knowledge points to infer new knowledge. The paper systematically evaluates the performance of LLMs in this regard by designing a synthetic dataset containing seven representative OCKR tasks. The study finds that, whether in the individual or adjacent training settings, the proficiency of LLMs in this aspect is limited. In addition, training the model to reason using reasoning examples does not significantly improve performance, while training the model to perform explicit knowledge retrieval helps to retrieve attribute knowledge, but is not very helpful for retrieving relationship knowledge, indicating that the model has difficulties in knowledge retrieval. Furthermore, the paper also evaluates cross - language knowledge transfer as a special form of OCKR, and the results show that the evaluated models also show limited ability in cross - language knowledge transfer.