Abstract:Grounding the reasoning ability of large language models (LLMs) for embodied tasks is challenging due to the complexity of the physical world. Especially, LLM planning for multi-agent collaboration requires communication of agents or credit assignment as the feedback to re-adjust the proposed plans and achieve effective coordination. However, existing methods that overly rely on physical verification or self-reflection suffer from excessive and inefficient querying of LLMs. In this paper, we propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans. Specifically, we perform critic regression to learn a sequential advantage function from LLM-planned data, and then treat the LLM planner as an optimizer to generate actions that maximize the advantage function. It endows the LLM with the foresight to discern whether the action contributes to accomplishing the final task. We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems. Experiments on Overcooked-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents and query rounds of LLMs, demonstrating its high efficiency for grounding LLMs. More results are given at \url{

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to efficiently combine the capabilities of large - language models (LLMs) with tasks in the physical world in multi - agent collaborative tasks. Specifically, the paper focuses on how to optimize the plans generated by LLMs through effective feedback mechanisms in a multi - agent environment to achieve more efficient coordination and cooperation. Existing methods usually rely on physical verification or self - reflection, which often lead to overly frequent and inefficient queries to LLMs. Moreover, in a multi - agent setting, due to the need to cooperate with other agents through communication and negotiation, the need for effective feedback is more complex. Existing methods are difficult to evaluate the effect of individual actions in a team outcome, which results in feedback mechanisms either querying LLMs too many times or interacting with the physical environment too frequently. To solve these problems, the paper proposes a new framework - Reinforced Advantage (ReAd), which introduces a reinforced - advantage feedback mechanism for efficient self - optimization of plans. ReAd learns a sequence - advantage function from LLM - planning data by performing critic regression and uses the LLM planner as an optimizer to generate actions that can maximize the advantage function. This method enables the LLM to have the ability to foresee whether an action is helpful for completing the final task. The paper also provides theoretical analysis, further supporting the effectiveness of this method by extending the advantage - weighted regression in reinforcement learning to multi - agent systems. Experimental results show that ReAd outperforms the baseline methods in terms of success rate and significantly reduces the interaction steps between agents and the query rounds to the LLM, demonstrating its efficiency in applying LLM capabilities to multi - agent collaborative tasks in the physical world.

Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration

Enabling Efficient Interaction between an Algorithm Agent and an LLM: A Reinforcement Learning Approach

Self-driven Grounding: Large Language Model Agents with Automatical Language-aligned Skill Learning

Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach

LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models

From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems

Embodied LLM Agents Learn to Cooperate in Organized Teams

Grounding Large Language Models In Embodied Environment With Imperfect World Models

Building Cooperative Embodied Agents Modularly with Large Language Models

Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models

LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models

Inner Monologue: Embodied Reasoning through Planning with Language Models

Large Language Model-based Human-Agent Collaboration for Complex Task Solving

Grounding Language Plans in Demonstrations Through Counterfactual Perturbations

Collaborating with language models for embodied reasoning

Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning

Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information

Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach

Cooperative Strategic Planning Enhances Reasoning Capabilities in Large Language Models

AgentBench: Evaluating LLMs as Agents