Automating the Enterprise with Foundation Models

Michael Wornow,Avanika Narayan,Krista Opsahl-Ong,Quinn McIntyre,Nigam H. Shah,Christopher Re

2024-05-04

Abstract:Automating enterprise workflows could unlock $4 trillion/year in productivity gains. Despite being of interest to the data management community for decades, the ultimate vision of end-to-end workflow automation has remained elusive. Current solutions rely on process mining and robotic process automation (RPA), in which a bot is hard-coded to follow a set of predefined rules for completing a workflow. Through case studies of a hospital and large B2B enterprise, we find that the adoption of RPA has been inhibited by high set-up costs (12-18 months), unreliable execution (60% initial accuracy), and burdensome maintenance (requiring multiple FTEs). Multimodal foundation models (FMs) such as GPT-4 offer a promising new approach for end-to-end workflow automation given their generalized reasoning and planning abilities. To study these capabilities we propose ECLAIR, a system to automate enterprise workflows with minimal human supervision. We conduct initial experiments showing that multimodal FMs can address the limitations of traditional RPA with (1) near-human-level understanding of workflows (93% accuracy on a workflow understanding task) and (2) instant set-up with minimal technical barrier (based solely on a natural language description of a workflow, ECLAIR achieves end-to-end completion rates of 40%). We identify human-AI collaboration, validation, and self-improvement as open challenges, and suggest ways they can be solved with data management techniques. Code is available at:

Software Engineering,Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

The paper aims to address the issue of enterprise workflow automation, particularly by overcoming the limitations of traditional Robotic Process Automation (RPA) methods through the use of multimodal foundation models such as GPT-4. Specifically, the paper identifies three main problems faced by RPA in practical applications: 1. **High Setup Costs**: Implementing RPA requires a significant amount of time and expertise, typically taking 12 to 18 months from project initiation to deployment. It also requires specialized technical personnel to write automation scripts and integrate with IT infrastructure. 2. **Execution Fragility**: Since RPA relies on hard-coded rules, it cannot adapt to subtle changes in input data, resulting in an initial accuracy rate of about 60%. It requires a long period of improvement to achieve a high accuracy rate (e.g., 95%). 3. **Heavy Maintenance Burden**: RPA systems often require continuous human supervision to verify output results and handle exceptions, adding extra costs and workload. To address these issues, the authors propose a new system called ECLAIR, which leverages multimodal foundation models to achieve end-to-end workflow automation. The main features of ECLAIR include: - **Demonstration**: Learning human workflow knowledge by watching video demonstrations and reading documents. - **Execution**: Planning and executing actions based on visual understanding and reasoning capabilities. - **Verification**: Utilizing the model for self-monitoring and error correction, reducing the need for human supervision. The paper presents preliminary experiments demonstrating the potential of ECLAIR in these areas and discusses the current challenges and future research directions.

Automating the Enterprise with Foundation Models

Automating the Enterprise with Foundation Models

WONDERBREAD: A Benchmark for Evaluating Multimodal Foundation Models on Business Process Management Tasks

FlowMind: Automatic Workflow Generation with LLMs

From Words to Workflows: Automating Business Processes

The Case for Developing a Foundation Model for Planning-like Tasks from Scratch

AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents

Incorporating Large Language Models into Production Systems for Enhanced Task Automation and Flexibility

A Case for Business Process-Specific Foundation Models

The Foundations of Computational Management: A Systematic Approach to Task Automation for the Integration of Artificial Intelligence into Existing Workflows

Can Foundation Models Wrangle Your Data?

BMW Agents -- A Framework For Task Automation Through Multi-Agent Collaboration

ProAgent: From Robotic Process Automation to Agentic Process Automation

SmartFlow: Robotic Process Automation using LLMs

AFlow: Automating Agentic Workflow Generation

A framework for implementing robotic process automation projects

Automated Enterprise Architecture Model Mining

Towards Automating the AI Operations Lifecycle

Generating a Low-code Complete Workflow via Task Decomposition and RAG

Towards Automated Solution Recipe Generation for Industrial Asset Management with LLM