Abstract:Recent advancements in large language models (LLMs) have driven a revolutionary paradigm shift in process automation from Robotic Process Automation to Agentic Process Automation by automating the workflow orchestration procedure based on LLMs. However, existing LLMs (even the advanced OpenAI GPT-4o) are confined to achieving satisfactory capability in workflow orchestration. To address this limitation, we present WorkflowLLM, a data-centric framework elaborately designed to enhance the capability of LLMs in workflow orchestration. It first constructs a large-scale fine-tuning dataset WorkflowBench with 106,763 samples, covering 1,503 APIs from 83 applications across 28 categories. Specifically, the construction process can be divided into three phases: (1) Data Collection: we collect real-world workflow data from Apple Shortcuts and RoutineHub, transcribing them into Python-style code. We further equip them with generated hierarchical thought via ChatGPT. (2) Query Expansion: we prompt ChatGPT to generate more task queries to enrich the diversity and complexity of workflows. (3) Workflow Generation: we leverage an annotator model trained on collected data to generate workflows for synthesized queries. Finally, we merge the synthetic samples that pass quality confirmation with the collected samples to obtain the WorkflowBench. Based on WorkflowBench, we fine-tune Llama-3.1-8B to obtain WorkflowLlama. Our experiments show that WorkflowLlama demonstrates a strong capacity to orchestrate complex workflows, while also achieving notable generalization performance on previously unseen APIs. Additionally, WorkflowBench exhibits robust zero-shot generalization capabilities on an out-of-distribution task planning dataset, T-Eval. Our data and code are available at <a class="link-external link-https" href="https://github.com/OpenBMB/WorkflowLLM" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to enhance the workflow orchestration capabilities in large - language models (LLMs). Specifically, existing LLMs have two main limitations when dealing with complex workflow orchestration tasks: 1. **Limited action scale**: Current LLMs can only handle small - scale workflows with a limited number of actions. Even the state - of - the - art OpenAI GPT - 4 can only manage workflows containing an average of 6.1 actions on average, which is far from meeting real - world requirements. For example, workflows in Apple Shortcuts contain an average of 70.4 actions on average. 2. **Simple logical structure**: Most of the existing research mainly focuses on generating sequential actions, while applications in the real world usually involve complex logical structures such as branches and loops. For example, workflows in Apple Shortcuts contain an average of 2.6 nested branch/loop logical structures on average. To overcome these limitations, the paper proposes the WorkflowLLM framework, aiming to improve the capabilities of LLMs in workflow orchestration through a data - driven approach. Specific methods include: - **Data collection**: Collect high - quality workflow data from Apple Shortcuts and RoutineHub and convert it into Python - style code. In addition, use ChatGPT to generate detailed annotations and task plans to enrich the data content. - **Query expansion**: Utilize ChatGPT to generate more diverse task queries to increase the diversity and complexity of the data. - **Workflow generation**: Train an annotator model based on the collected data to generate workflows corresponding to synthetic queries. Through the quality confirmation step, ensure the quality and integrity of the generated workflows. Through these steps, the paper constructs a large - scale fine - tuning dataset WorkflowBench, and on this basis, fine - tunes the Llama - 3.1 - 8B model to obtain WorkflowLlama. Experimental results show that WorkflowLlama performs excellently in handling complex workflows and unseen APIs, and also demonstrates strong generalization ability on the zero - sample task planning dataset T - Eval.

WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models

Benchmarking Agentic Workflow Generation

LLM4Workflow: An LLM-based Automated Workflow Model Generation Tool

AutoFlow: Automated Workflow Generation for Large Language Model Agents

Large Language Models for Constructing and Optimizing Machine Learning Workflows: A Survey

AFlow: Automating Agentic Workflow Generation

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Plan, Generate and Match: Scientific Workflow Recommendation with Large Language Models

Harnessing LLMs for API Interactions: A Framework for Classification and Synthetic Data Generation

On the effectiveness of Large Language Models for GitHub Workflows

Large Language Models to the Rescue: Reducing the Complexity in Scientific Workflow Development Using ChatGPT

Do Large Language Models Speak Scientific Workflows?

FlowMind: Automatic Workflow Generation with LLMs

From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs

Opus: A Large Work Model for Complex Workflow Generation

LLM-for-X: Application-agnostic Integration of Large Language Models to Support Personal Writing Workflows

LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs

Planning-Driven Programming: A Large Language Model Programming Workflow

TaskBench: Benchmarking Large Language Models for Task Automation

Sketch: A Toolkit for Streamlining LLM Operations

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency