Abstract:Large language models (LLMs) have been increasingly applied to tasks in language understanding and interactive decision-making, with their impressive performance largely attributed to the extensive domain knowledge embedded within them. However, the depth and breadth of this knowledge can vary across domains. Many existing approaches assume that LLMs possess a comprehensive understanding of their environment, often overlooking potential gaps in their grasp of actual world dynamics. To address this, we introduce Discover, Verify, and Evolve (DiVE), a framework that discovers world dynamics from a small number of demonstrations, verifies the accuracy of these dynamics, and evolves new, advanced dynamics tailored to the current situation. Through extensive evaluations, we assess the impact of each component on performance and compare the dynamics generated by DiVE to human-annotated dynamics. Our results show that LLMs guided by DiVE make more informed decisions, achieving rewards comparable to human players in the Crafter environment and surpassing methods that require prior task-specific training in the MiniHack environment.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that when large - language models (LLMs) are applied to task understanding and interactive decision - making, due to the limitations of training data, these models may lack in - depth understanding of the world dynamics in specific domains. Although LLMs have a broad world view by assimilating Internet - scale knowledge, the depth and breadth of this knowledge may vary in different domains. Many existing methods assume that LLMs have a comprehensive understanding of their environment and often overlook the potential gap in their grasp of the actual world dynamics. To address this issue, the paper proposes the Discover, Verify, and Evolve (DiVE) framework, which aims to discover world dynamics from a small number of demonstrations, verify the accuracy of these dynamics, and evolve new, more advanced dynamics according to the current situation. Through extensive evaluation, the paper explores the impact of each component on performance and compares the dynamics generated by DiVE with those labeled by humans. The research results show that LLMs guided by DiVE can make more informed decisions, achieving rewards comparable to human players in the Crafter environment and outperforming methods requiring task - specific training in the MiniHack environment.
Specifically, the paper mainly focuses on the following aspects:
- **Knowledge Gap**: The paper defines the relationship between the knowledge set \(K_{LLM}\) embedded in LLMs and the general set \(K_{target}\) of knowledge related to the target domain. To ensure the effectiveness of LLMs, the paper aims to make the subset \(K_{relevant}\) of \(K_{LLM}\) related to \(K_{target}\) cover as widely as possible and contain more reliable knowledge \(K^+\) rather than inaccurate knowledge \(K^-\), that is, \(K_{relevant}=K^+\cup K^-\).
- **DiVE Framework**: The DiVE framework consists of three parts: Discoverer, Verifier, and Evolver. The Discoverer iteratively reveals environmental dynamics from demonstrations through the curriculum - learning method; the Verifier eliminates unreliable dynamics caused by the hallucination tendency of LLMs; the Evolver reasons out in - depth, state - specific strategies for the current situation based on the learned dynamics.
- **Experimental Evaluation**: The paper evaluates the effect of DiVE in two environments, Crafter and MiniHack, demonstrating the advantages of DiVE in learning comprehensive and reliable dynamics, guiding the agent decision - making process, and evolving in - depth strategies. The experimental results show that DiVE achieves rewards comparable to human players in the Crafter environment and outperforms methods requiring task - specific training in the MiniHack environment.
In conclusion, the main contribution of this paper is to propose a framework for learning world dynamics from demonstrations, guiding the decision - making process of LLMs through online evolution of situational strategies, thereby bridging potential knowledge gaps and improving the decision - making efficiency of LLMs.