Theory of Mind for Multi-Agent Collaboration via Large Language Models

Huao Li,Yu Quan Chong,Simon Stepputtis,Joseph Campbell,Dana Hughes,Michael Lewis,Katia Sycara
DOI: https://doi.org/10.18653/v1/2023.emnlp-main.13
2024-06-27
Abstract:While Large Language Models (LLMs) have demonstrated impressive accomplishments in both reasoning and planning, their abilities in multi-agent collaborations remains largely unexplored. This study evaluates LLM-based agents in a multi-agent cooperative text game with Theory of Mind (ToM) inference tasks, comparing their performance with Multi-Agent Reinforcement Learning (MARL) and planning-based baselines. We observed evidence of emergent collaborative behaviors and high-order Theory of Mind capabilities among LLM-based agents. Our results reveal limitations in LLM-based agents' planning optimization due to systematic failures in managing long-horizon contexts and hallucination about the task state. We explore the use of explicit belief state representations to mitigate these issues, finding that it enhances task performance and the accuracy of ToM inferences for LLM-based agents.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper primarily explores the application of large language models (LLMs) in multi-agent collaboration, particularly by evaluating these models' performance through Theory of Mind (ToM) reasoning tasks. The core objectives of the paper include: 1. **Evaluating the performance of LLM-based multi-agents in cooperative tasks**: The authors designed a text-based game to assess the capabilities of LLM-driven agents in multi-agent cooperative tasks and compared them with multi-agent reinforcement learning (MARL) and planning-based baselines. 2. **Identifying the limitations of LLM-based agents in cooperative efficiency**: The study found that LLM-based agents systematically fail in handling long temporal contexts and task state hallucinations. 3. **Proposing mitigation strategies**: To improve the performance of LLM-based agents, the authors explored a method of explicitly representing belief states, which enhanced task performance and the accuracy of higher-order ToM reasoning. Specifically, the authors designed a multi-agent environment simulating a search and rescue task, where three agents (Alpha, Bravo, and Charlie) need to collaborate to locate and safely defuse scattered colored bombs in the environment. Each bomb has a unique sequence of stages that must be defused in the correct order using wire cutters. The agents must coordinate their actions to improve efficiency. The task environment used in the study is a graph structure containing multiple rooms, where agents can move to different rooms, check the bomb's stage sequence, or use wire cutters. Experimental results show that the team using GPT-4 successfully completed the tasks in all experiments, while the ChatGPT team failed to complete the tasks within the time limit. After introducing explicit belief state representation, the efficiency of the GPT-4-based team significantly improved. Additionally, the authors evaluated the ToM reasoning abilities of LLM-based agents and found that these agents exhibited different levels of capability in introspection, first-order ToM reasoning, and second-order ToM reasoning. In summary, this paper provides an in-depth study of the application of LLMs in multi-agent collaboration scenarios, revealing their strengths and limitations, and proposes improvement methods to enhance their collaborative capabilities in complex tasks.