PIANIST: Learning Partially Observable World Models with LLMs for Multi-Agent Decision Making

Jonathan Light,Sixue Xing,Yuanzhe Liu,Weiqin Chen,Min Cai,Xiusi Chen,Guanzhi Wang,Wei Cheng,Yisong Yue,Ziniu Hu
2024-11-25
Abstract:Effective extraction of the world knowledge in LLMs for complex decision-making tasks remains a challenge. We propose a framework PIANIST for decomposing the world model into seven intuitive components conducive to zero-shot LLM generation. Given only the natural language description of the game and how input observations are formatted, our method can generate a working world model for fast and efficient MCTS simulation. We show that our method works well on two different games that challenge the planning and decision making skills of the agent for both language and non-language based action taking, without any training on domain-specific training data or explicitly defined world model.
Artificial Intelligence,Machine Learning,Multiagent Systems
What problem does this paper attempt to address?
This paper attempts to solve the problem of how to effectively extract world knowledge from large - language models (LLMs) in complex decision - making tasks in multi - agent, partially observable environments. Specifically, the paper proposes a framework named PIANIST, which aims to decompose the world model into seven intuitive components for zero - sample generation. These components include: 1. **Information Set (I)**: The information set observed by the agent, providing code to represent these information sets and natural - language game descriptions as an interface between the real world and the agent. 2. **Hidden State (S)**: The agent records any relevant hidden information. 3. **Actor (N)**: Used to specify the actor names in the action function and the reward function. 4. **Action Function (A)**: For larger action spaces, the function returns the top k most likely actions; for language - based actions, the LLM generates the top k text options. 5. **Transition - Reward Function (T, R)**: Combines state prediction and reward allocation for each player to minimize LLM errors. Deterministic transitions further reduce generation errors. 6. **Information Partition Function (P)**: Maps the hidden state to the information set. 7. **Information Realization Function (I → S)**: Maps the information set to the most likely hidden state, enabling the agent to simulate transitions between hidden states. Through this method, the PIANIST framework can utilize the world model generated by the LLM for fast and efficient Monte - Carlo Tree Search (MCTS) simulations, thereby solving complex decision - making tasks without using domain - specific training data or explicitly defining the world model. The paper demonstrates the effectiveness of this method in two different games, which respectively challenge the agent's planning and decision - making abilities, including language - based and non - language - based actions.