Abstract:Agents powered by large language models have shown remarkable abilities in solving complex tasks. However, most agent systems remain reactive, limiting their effectiveness in scenarios requiring foresight and autonomous decision-making. In this paper, we tackle the challenge of developing proactive agents capable of anticipating and initiating tasks without explicit human instructions. We propose a novel data-driven approach for this problem. Firstly, we collect real-world human activities to generate proactive task predictions. These predictions are then labeled by human annotators as either accepted or rejected. The labeled data is used to train a reward model that simulates human judgment and serves as an automatic evaluator of the proactiveness of LLM agents. Building on this, we develop a comprehensive data generation pipeline to create a diverse dataset, ProactiveBench, containing 6,790 events. Finally, we demonstrate that fine-tuning models with the proposed ProactiveBench can significantly elicit the proactiveness of LLM agents. Experimental results show that our fine-tuned model achieves an F1-Score of 66.47% in proactively offering assistance, outperforming all open-source and close-source models. These results highlight the potential of our method in creating more proactive and effective agent systems, paving the way for future advancements in human-agent collaboration.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the issue where large language model (LLM)-driven agent systems primarily rely on passive responses, meaning these systems usually require explicit human instructions to initiate tasks, lacking foresight and autonomous decision-making capabilities. This limitation is particularly evident in scenarios that require advance planning and autonomous service. Therefore, the authors propose a new data-driven approach to develop agent systems capable of proactively predicting and initiating tasks without explicit human instructions. Specifically, the goals of the paper are: 1. **Develop Proactive Agents**: Build an agent system that can predict potential tasks based on environmental observations and user activities, and proactively offer assistance. 2. **Evaluate and Enhance Proactivity**: By constructing a comprehensive data generation pipeline, create a diverse dataset, ProactiveBench, containing 6,790 events, to evaluate and improve the proactivity of agent systems. 3. **Improve User Experience**: Through experimental validation, demonstrate that the fine-tuned models significantly outperform existing open-source and closed-source models in proactively offering assistance, thereby enhancing the overall user experience. ### Main Contributions - **Dataset Construction**: Collect real-world user activity data, generate proactive task predictions, and create training data through manual annotation for training reward models. - **Data Generation Pipeline**: Develop an automated data generation pipeline, including environment simulation, event generation, and state maintenance, to produce diverse training data. - **Model Fine-Tuning**: Fine-tune models such as LLaMA-3.1-8B-Instruct and Qwen2-7B-Instruct, significantly improving their proactive performance. - **Performance Evaluation**: Through experimental validation, demonstrate the performance of the fine-tuned models on ProactiveBench, particularly in the aspect of proactively offering assistance. ### Experimental Results - **Performance Comparison**: Experimental results show that the fine-tuned Qwen2-7B-Instruct model achieved an F1 score of 66.47% in proactively offering assistance, significantly outperforming all existing open-source and closed-source models. - **Reward Model Evaluation**: The trained reward model achieved an F1 score of 91.80% in consistency with human judgments, indicating its effectiveness in evaluating the proactivity of agent systems. ### Conclusion This paper successfully develops an agent system capable of proactively predicting and initiating tasks by constructing the ProactiveBench dataset and data generation pipeline. Experimental results indicate that this approach not only significantly enhances the proactivity of agent systems but also opens up new possibilities for future human-machine collaboration.

Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance

ProAgent: Building Proactive Cooperative Agents with Large Language Models

AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment

AutoAct: Automatic Agent Learning from Scratch for QA Via Self-Planning

Prompting and Evaluating Large Language Models for Proactive Dialogues: Clarification, Target-guided, and Non-collaboration

AgentBench: Evaluating LLMs as Agents

Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization

Ask-before-Plan: Proactive Language Agents for Real-World Planning

Improving Proactive Dialog Agents Using Socially-Aware Reinforcement Learning

Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents

Adaptive In-conversation Team Building for Language Model Agents

ProAgent: From Robotic Process Automation to Agentic Process Automation

Rethinking Conversational Agents in the Era of Large Language Models: Proactivity, Non-collaborativity, and Beyond

Shall We Team Up: Exploring Spontaneous Cooperation of Competing LLM Agents

Large Language Model-based Human-Agent Collaboration for Complex Task Solving

MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents

Rethinking Conversational Agents in the Era of LLMs: Proactivity, Non-collaborativity, and Beyond.

AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents

Professional Agents -- Evolving Large Language Models into Autonomous Experts with Human-Level Competencies