Large Language Models Can Self-Improve in Long-context Reasoning

Siheng Li,Cheng Yang,Zesen Cheng,Lemao Liu,Mo Yu,Yujiu Yang,Wai Lam

2024-11-13

Abstract:Large language models (LLMs) have achieved substantial progress in processing long contexts but still struggle with long-context reasoning. Existing approaches typically involve fine-tuning LLMs with synthetic data, which depends on annotations from human experts or advanced models like GPT-4, thus restricting further advancements. To address this issue, we investigate the potential for LLMs to self-improve in long-context reasoning and propose \ours, an approach specifically designed for this purpose. This approach is straightforward: we sample multiple outputs for each question, score them with Minimum Bayes Risk, and then apply supervised fine-tuning or preference optimization based on these outputs. Extensive experiments on several leading LLMs demonstrate the effectiveness of \ours, with an absolute improvement of $4.2$ points for Llama-3.1-8B-Instruct. Furthermore, \ours achieves superior performance compared to prior approaches that depend on data produced by human experts or advanced models. We anticipate that this work will open new avenues for self-improvement techniques in long-context scenarios, which are essential for the continual advancement of LLMs.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The problem this paper attempts to address is the insufficient ability of large language models (LLMs) in handling long-text context reasoning. Although existing LLMs have made significant progress in processing long texts, they still perform poorly on tasks that require reasoning across multiple paragraphs. To overcome this limitation, the paper proposes a method called SEALONG, which aims to enable LLMs to self-improve their performance in long-text context reasoning. Specifically, the paper addresses the problem through the following approaches: 1. **Sampling multiple reasoning paths**: For each question and its corresponding long-text context, a plan-and-solve prompting strategy is used to sample multiple reasoning paths from the LLM. 2. **Scoring mechanism**: The outputs are scored using Minimum Bayes Risk (MBR), prioritizing reasoning paths that are consistent with the majority of outputs. 3. **Supervised fine-tuning or preference optimization**: Based on the scoring results, high-scoring outputs can be used for supervised fine-tuning, or both high-scoring and low-scoring outputs can be used for preference optimization. Through these steps, the SEALONG method can effectively enhance the performance of LLMs in long-text context reasoning tasks without the need for human experts or advanced model annotations. Experimental results show that SEALONG achieves significant performance improvements across multiple LLMs, particularly excelling in multi-document question-answering tasks.

Large Language Models Can Self-Improve in Long-context Reasoning

Enhancing Large Language Models' Situated Faithfulness to External Contexts

Supervised Knowledge Makes Large Language Models Better In-context Learners

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Language Model Self-improvement by Reinforcement Learning Contemplation

Introspective Tips: Large Language Model for In-Context Decision Making

Long-context LLMs Struggle with Long In-context Learning

Large Language Models are reasoners with Self-Verification

Large Language Models have Intrinsic Self-Correction Ability

LooGLE: Can Long-Context Language Models Understand Long Contexts?

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

Large Language Models Cannot Self-Correct Reasoning Yet

Large Language Models are In-context Teachers for Knowledge Reasoning

BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack

Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding

Self-Discover: Large Language Models Self-Compose Reasoning Structures

Large Language Models Are In-Context Semantic Reasoners Rather Than Symbolic Reasoners

Large Language Models Know What Makes Exemplary Contexts

Enhancing Large Language Model with Self-Controlled Memory Framework

Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency