Abstract:In recent developments within the research community, the integration of
Large Language Models (LLMs) in creating fully autonomous agents has garnered
significant interest. Despite this, LLM-based agents frequently demonstrate
notable shortcomings in adjusting to dynamic environments and fully grasping
human needs. In this work, we introduce the problem of LLM-based human-agent
collaboration for complex task-solving, exploring their synergistic potential.
In addition, we propose a Reinforcement Learning-based Human-Agent
Collaboration method, ReHAC. This approach includes a policy model designed to
determine the most opportune stages for human intervention within the
task-solving process. We construct a human-agent collaboration dataset to train
this policy model in an offline reinforcement learning environment. Our
validation tests confirm the model's effectiveness. The results demonstrate
that the synergistic efforts of humans and LLM-based agents significantly
improve performance in complex tasks, primarily through well-planned, limited
human intervention. Datasets and code are available at:
https://github.com/XueyangFeng/ReHAC.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to achieve collaboration between humans and agents through large - language models (LLMs) to solve complex tasks more effectively**.
Specifically, although LLMs - based agents perform well in understanding, planning, and reasoning, they still have limitations when dealing with complex real - world tasks, especially in cases where they need to adapt to dynamic environments and fully understand human needs. To solve this problem, the author introduced the concept of "LLM - based human - agent collaboration" and proposed a reinforcement - learning - based human - agent collaboration method (ReHAC), aiming to improve the efficiency of solving complex tasks through limited and carefully planned human interventions.
### Main problems and challenges
1. **Agent's capacity limitations**: Existing LLMs agents' intelligence level is still not sufficient to reach human - like proficiency when dealing with complex and dynamic real - world tasks, especially in fields requiring high precision (such as law or finance).
2. **Human - agent collaboration pattern**: How to define the division of labor between humans and agents, determine the granularity of tool execution, manage active interruptions, and implement multi - level interventions.
3. **Optimal intervention timing**: How to determine the most favorable stage of human intervention during the task - solving process to minimize the number of interventions and maximize task performance.
### Solutions
The author proposed ReHAC (Reinforcement Learning - based Human - Agent Collaboration), which is a reinforcement - learning - based method for training a policy model to dynamically identify the most appropriate moment for human intervention during the task - solving process. ReHAC solves the above problems in the following ways:
- **Policy model training**: Collect a dataset of tasks completed jointly by humans and LLMs agents, and train the policy model in an offline reinforcement - learning environment.
- **Reward function design**: Define a reward function \(R(s, a)=T(s, a)-\lambda C(s, a)\), where \(T(s, a)\) represents the expected task reward, \(C(s, a)\) represents the number of interventions, and \(\lambda\) is a penalty coefficient.
- **Optimization algorithm**: Use the REINFORCE algorithm to optimize the expected reward, ensuring maximizing the task reward while minimizing the cost of human intervention.
### Experimental verification
The author conducted experiments on multiple multi - step reasoning datasets, including HotpotQA, StrategyQA, and InterCode, and the results show that ReHAC can effectively allocate human interventions, thus achieving better results in solving complex tasks.
In conclusion, the main goal of this paper is to explore a new collaboration pattern by combining human intelligence and the capabilities of LLMs agents to solve complex tasks more efficiently.