Temporal Blind Spots in Large Language Models

Jonas Wallat,Adam Jatowt,Avishek Anand
2024-01-23
Abstract:Large language models (LLMs) have recently gained significant attention due to their unparalleled ability to perform various natural language processing tasks. These models, benefiting from their advanced natural language understanding capabilities, have demonstrated impressive zero-shot performance. However, the pre-training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available\footnote{
Computation and Language
What problem does this paper attempt to address?
This paper discusses the limitations of large-scale language models (LLMs) in handling tasks involving temporal information. Although LLMs perform well in natural language processing tasks, they are usually pre-trained on specific corpora, which limits their understanding of fresh information and temporal range. The research found that LLMs perform poorly in handling questions regarding detailed past information and new information, indicating a blind spot in temporal knowledge. The authors conducted experiments using three popular time question-answering datasets, revealing a decrease in model performance when dealing with tasks involving temporal understanding. The paper analyzes the conditions for these errors and provides insights for developing future models that better handle time-oriented tasks. The study also points out that LLMs are better at capturing new information than old information, but this trend may weaken after a certain point, indicating the presence of temporal inertia. Additionally, the model exhibits different sensitivities to absolute time and relative time references, where relative time references may result in performance decline as the model may fail to correctly process such expressions. Overall, the paper aims to understand the temporal knowledge limitations of LLMs and provide guidance for improving the models.