Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach

Bin Hu,Chenyang Zhao,Pu Zhang,Zihao Zhou,Yuanhang Yang,Zenglin Xu,Bin Liu
2024-06-21
Abstract:Large language models (LLMs) encode a vast amount of world knowledge acquired from massive text datasets. Recent studies have demonstrated that LLMs can assist an embodied agent in solving complex sequential decision making tasks by providing high-level instructions. However, interactions with LLMs can be time-consuming. In many practical scenarios, it requires a significant amount of storage space that can only be deployed on remote cloud servers. Additionally, using commercial LLMs can be costly since they may charge based on usage frequency. In this paper, we explore how to enable intelligent cost-effective interactions between a down stream task oriented agent and an LLM. We find that this problem can be naturally formulated by a Markov decision process (MDP), and propose When2Ask, a reinforcement learning based approach that learns when it is necessary to query LLMs for high-level instructions to accomplish a target task. On one side, When2Ask discourages unnecessary redundant interactions, while on the other side, it enables the agent to identify and follow useful instructions from the LLM. This enables the agent to halt an ongoing plan and transition to a more suitable one based on new environmental observations. Experiments on MiniGrid and Habitat environments that entail planning sub-goals demonstrate that When2Ask learns to solve target tasks with only a few necessary interactions with the LLM, significantly reducing interaction costs in testing environments compared with baseline methods. Our code is available at: <a class="link-external link-https" href="https://github.com/ZJLAB-AMMI/LLM4RL" rel="external noopener nofollow">this https URL</a>.
Artificial Intelligence
What problem does this paper attempt to address?
The paper primarily aims to address the following issues: 1. **Efficient Interaction**: How to achieve intelligent and cost-effective interaction between agents and large language models (LLMs). Existing methods often lead to unnecessary resource wastage (such as costs from frequent LLM queries, communication overhead, and inference time), or due to insufficient queries, agents fail to obtain useful instructions in time to adjust their plans to cope with complex and changing environments. 2. **Timely Consultation**: Determining when to request new high-level instructions from the LLM is a challenge that requires task-specific expertise. For example, when an agent encounters an obstacle, it should be able to recognize this situation and adjust its plan in time, consulting the LLM for advice on how to handle these obstacles. 3. **Reducing Non-informative Interactions**: The paper proposes a method aimed at reducing the number of unnecessary interactions between the agent and the LLM while ensuring that the agent can effectively complete the target task. To address the above challenges, the paper proposes a method called When2Ask, which is a reinforcement learning-based approach for training agents to learn when to request high-level instructions from the LLM to complete specific tasks. This method significantly reduces interaction costs in the test environment by minimizing unnecessary interactions and improves the success rate of task completion. Specifically, When2Ask adopts a Planner-Actor-Mediator framework, where: - **Planner**: Played by a pre-trained LLM, responsible for generating high-level instructions. - **Actor**: Executes the instructions provided by the Planner. - **Mediator**: Acts as an interface between the Planner and the Actor, deciding when to request new instructions from the Planner and converting observations into text descriptions that the LLM can understand. Through this approach, When2Ask not only reduces the interaction costs between the agent and the LLM but also improves the efficiency and success rate of task completion. Experimental results show that compared to baseline methods, this approach can significantly reduce interaction costs in various environments while maintaining or improving the success rate of task completion.