Can Large Language Models Understand Context?

Yilun Zhu,Joel Ruben Antony Moniz,Shruti Bhargava,Jiarui Lu,Dhivya Piraviperumal,Site Li,Yuan Zhang,Hong Yu,Bo-Hsiang Tseng

2024-02-02

Abstract:Understanding context is key to understanding human language, an ability which Large Language Models (LLMs) have been increasingly seen to demonstrate to an impressive extent. However, though the evaluation of LLMs encompasses various domains within the realm of Natural Language Processing, limited attention has been paid to probing their linguistic capability of understanding contextual features. This paper introduces a context understanding benchmark by adapting existing datasets to suit the evaluation of generative models. This benchmark comprises of four distinct tasks and nine datasets, all featuring prompts designed to assess the models' ability to understand context. First, we evaluate the performance of LLMs under the in-context learning pretraining scenario. Experimental results indicate that pre-trained dense models struggle with understanding more nuanced contextual features when compared to state-of-the-art fine-tuned models. Second, as LLM compression holds growing significance in both research and real-world applications, we assess the context understanding of quantized models under in-context-learning settings. We find that 3-bit post-training quantization leads to varying degrees of performance reduction on our benchmark. We conduct an extensive analysis of these scenarios to substantiate our experimental results.

Computation and Language

What problem does this paper attempt to address?

The problems that this paper attempts to solve mainly focus on evaluating the ability of large language models (LLMs) in understanding context. Specifically, the author points out that although LLMs perform excellently in various tasks of natural language processing, relatively little research has been done on their ability to understand and process contextual features. Therefore, the paper introduces a context - understanding benchmark test to evaluate the understanding ability of generative models by adjusting existing data sets. This benchmark test includes four different tasks and nine data sets, and special prompts are designed to evaluate the model's ability to understand context. Specific problems include: 1. **Evaluation of context understanding**: By constructing a benchmark test that includes four tasks, namely coreference resolution, dialogue state tracking, implicit discourse relation classification, and query rewriting, evaluate the performance of LLMs in different context - understanding tasks. 2. **Comparison between pre - trained and fine - tuned models**: Research the differences between pre - trained dense models and the state - of - the - art fine - tuned models in understanding more detailed contextual features. 3. **Impact of model compression**: Evaluate the context - understanding ability of quantized models in the context - learning setting, especially the performance degradation of 3 - bit post - training quantization. These studies aim to provide a comprehensive evaluation framework to help better understand the advantages and limitations of LLMs in context understanding and explore the impact of model compression techniques on context - understanding ability.

Can Large Language Models Understand Context?

Large Language Models Know What Makes Exemplary Contexts

LooGLE: Can Long-Context Language Models Understand Long Contexts?

Supervised Knowledge Makes Large Language Models Better In-context Learners

Long-context LLMs Struggle with Long In-context Learning

End-to-End Speech Recognition Contextualization with Large Language Models

MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations

CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models

When Context Leads but Parametric Memory Follows in Large Language Models

Context-faithful Prompting for Large Language Models

What Do Language Models Learn in Context? The Structured Task Hypothesis

Can large language models explore in-context?

Empower Your Model with Longer and Better Context Comprehension

Large Language Models Can Self-Improve in Long-context Reasoning

Do Large Language Models Understand Logic or Just Mimick Context?

Naive Bayes-based Context Extension for Large Language Models

Can large language models understand uncommon meanings of common words?

On Context Utilization in Summarization with Large Language Models

Why Larger Language Models Do In-context Learning Differently?

ICLEval: Evaluating In-Context Learning Ability of Large Language Models

Adapting LLMs for Efficient Context Processing through Soft Prompt Compression