Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time

David Herel,Vojtech Bartek,Tomas Mikolov
2024-09-20
Abstract:Who is the US President? The answer changes depending on when the question is asked. While large language models (LLMs) are evaluated on various reasoning tasks, they often miss a crucial dimension: time. In real-world scenarios, the correctness of answers is frequently tied to temporal context. In this paper, we introduce a novel dataset designed to rigorously test LLMs' ability to handle time-sensitive facts. Our benchmark offers a systematic way to measure how well LLMs align their knowledge with the correct time context, filling a key gap in current evaluation methods and offering a valuable tool for improving real-world applicability in future models.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?