Abstract:Large language models (LLMs) have been increasingly applied to tasks in language understanding and interactive decision-making, with their impressive performance largely attributed to the extensive domain knowledge embedded within them. However, the depth and breadth of this knowledge can vary across domains. Many existing approaches assume that LLMs possess a comprehensive understanding of their environment, often overlooking potential gaps in their grasp of actual world dynamics. To address this, we introduce Discover, Verify, and Evolve (DiVE), a framework that discovers world dynamics from a small number of demonstrations, verifies the accuracy of these dynamics, and evolves new, advanced dynamics tailored to the current situation. Through extensive evaluations, we assess the impact of each component on performance and compare the dynamics generated by DiVE to human-annotated dynamics. Our results show that LLMs guided by DiVE make more informed decisions, achieving rewards comparable to human players in the Crafter environment and surpassing methods that require prior task-specific training in the MiniHack environment.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that when large - language models (LLMs) are applied to task understanding and interactive decision - making, due to the limitations of training data, these models may lack in - depth understanding of the world dynamics in specific domains. Although LLMs have a broad world view by assimilating Internet - scale knowledge, the depth and breadth of this knowledge may vary in different domains. Many existing methods assume that LLMs have a comprehensive understanding of their environment and often overlook the potential gap in their grasp of the actual world dynamics. To address this issue, the paper proposes the Discover, Verify, and Evolve (DiVE) framework, which aims to discover world dynamics from a small number of demonstrations, verify the accuracy of these dynamics, and evolve new, more advanced dynamics according to the current situation. Through extensive evaluation, the paper explores the impact of each component on performance and compares the dynamics generated by DiVE with those labeled by humans. The research results show that LLMs guided by DiVE can make more informed decisions, achieving rewards comparable to human players in the Crafter environment and outperforming methods requiring task - specific training in the MiniHack environment. Specifically, the paper mainly focuses on the following aspects: - **Knowledge Gap**: The paper defines the relationship between the knowledge set \(K_{LLM}\) embedded in LLMs and the general set \(K_{target}\) of knowledge related to the target domain. To ensure the effectiveness of LLMs, the paper aims to make the subset \(K_{relevant}\) of \(K_{LLM}\) related to \(K_{target}\) cover as widely as possible and contain more reliable knowledge \(K^+\) rather than inaccurate knowledge \(K^-\), that is, \(K_{relevant}=K^+\cup K^-\). - **DiVE Framework**: The DiVE framework consists of three parts: Discoverer, Verifier, and Evolver. The Discoverer iteratively reveals environmental dynamics from demonstrations through the curriculum - learning method; the Verifier eliminates unreliable dynamics caused by the hallucination tendency of LLMs; the Evolver reasons out in - depth, state - specific strategies for the current situation based on the learned dynamics. - **Experimental Evaluation**: The paper evaluates the effect of DiVE in two environments, Crafter and MiniHack, demonstrating the advantages of DiVE in learning comprehensive and reliable dynamics, guiding the agent decision - making process, and evolving in - depth strategies. The experimental results show that DiVE achieves rewards comparable to human players in the Crafter environment and outperforms methods requiring task - specific training in the MiniHack environment. In conclusion, the main contribution of this paper is to propose a framework for learning world dynamics from demonstrations, guiding the decision - making process of LLMs through online evolution of situational strategies, thereby bridging potential knowledge gaps and improving the decision - making efficiency of LLMs.

Enhancing Agent Learning through World Dynamics Modeling

Learning to Model the World with Language

Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information

Language Models Meet World Models: Embodied Experiences Enhance Language Models

Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization

World Models with Hints of Large Language Models for Goal Achieving

Introspective Tips: Large Language Model for In-Context Decision Making

LLaMA Rider: Spurring Large Language Models to Explore the Open World

MindAgent: Emergent Gaming Interaction

DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

Making Large Language Models into World Models with Precondition and Effect Knowledge

Optimizing Large Language Models for Dynamic Constraints through Human-in-the-Loop Discriminators

STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making

WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents

PIANIST: Learning Partially Observable World Models with LLMs for Multi-Agent Decision Making

Intelligent Decision-Making and Human Language Communication Based on Deep Reinforcement Learning in a Wargame Environment

LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments

Grounding Large Language Models In Embodied Environment With Imperfect World Models

Language-Guided World Models: A Model-Based Approach to AI Control

Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Case Study