Abstract:While Large Language Models (LLMs) have demonstrated impressive accomplishments in both reasoning and planning, their abilities in multi-agent collaborations remains largely unexplored. This study evaluates LLM-based agents in a multi-agent cooperative text game with Theory of Mind (ToM) inference tasks, comparing their performance with Multi-Agent Reinforcement Learning (MARL) and planning-based baselines. We observed evidence of emergent collaborative behaviors and high-order Theory of Mind capabilities among LLM-based agents. Our results reveal limitations in LLM-based agents' planning optimization due to systematic failures in managing long-horizon contexts and hallucination about the task state. We explore the use of explicit belief state representations to mitigate these issues, finding that it enhances task performance and the accuracy of ToM inferences for LLM-based agents.

What problem does this paper attempt to address?

The paper primarily explores the application of large language models (LLMs) in multi-agent collaboration, particularly by evaluating these models' performance through Theory of Mind (ToM) reasoning tasks. The core objectives of the paper include: 1. **Evaluating the performance of LLM-based multi-agents in cooperative tasks**: The authors designed a text-based game to assess the capabilities of LLM-driven agents in multi-agent cooperative tasks and compared them with multi-agent reinforcement learning (MARL) and planning-based baselines. 2. **Identifying the limitations of LLM-based agents in cooperative efficiency**: The study found that LLM-based agents systematically fail in handling long temporal contexts and task state hallucinations. 3. **Proposing mitigation strategies**: To improve the performance of LLM-based agents, the authors explored a method of explicitly representing belief states, which enhanced task performance and the accuracy of higher-order ToM reasoning. Specifically, the authors designed a multi-agent environment simulating a search and rescue task, where three agents (Alpha, Bravo, and Charlie) need to collaborate to locate and safely defuse scattered colored bombs in the environment. Each bomb has a unique sequence of stages that must be defused in the correct order using wire cutters. The agents must coordinate their actions to improve efficiency. The task environment used in the study is a graph structure containing multiple rooms, where agents can move to different rooms, check the bomb's stage sequence, or use wire cutters. Experimental results show that the team using GPT-4 successfully completed the tasks in all experiments, while the ChatGPT team failed to complete the tasks within the time limit. After introducing explicit belief state representation, the efficiency of the GPT-4-based team significantly improved. Additionally, the authors evaluated the ToM reasoning abilities of LLM-based agents and found that these agents exhibited different levels of capability in introspection, first-order ToM reasoning, and second-order ToM reasoning. In summary, this paper provides an in-depth study of the application of LLMs in multi-agent collaboration scenarios, revealing their strengths and limitations, and proposes improvement methods to enhance their collaborative capabilities in complex tasks.

Theory of Mind for Multi-Agent Collaboration via Large Language Models

Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models

Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models

LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models

Emergence of Theory of Mind Collaboration in Multiagent Systems

Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests

Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game

Zero, Finite, and Infinite Belief History of Theory of Mind Reasoning in Large Language Models

Probing the Robustness of Theory of Mind in Large Language Models

Probing the Multi-turn Planning Capabilities of LLMs via 20 Question Games

Building Cooperative Embodied Agents Modularly with Large Language Models

ToM-LM: Delegating Theory of Mind Reasoning to External Symbolic Executors in Large Language Models

Computational Language Acquisition with Theory of Mind

LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions

Embodied LLM Agents Learn to Cooperate in Organized Teams

Synergistic Simulations: Multi-Agent Problem Solving with Large Language Models

Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach

How FaR Are Large Language Models From Agents with Theory-of-Mind?