Automating the Enterprise with Foundation Models

Michael Wornow,Avanika Narayan,Krista Opsahl-Ong,Quinn McIntyre,Nigam H. Shah,Christopher Re
2024-05-04
Abstract:Automating enterprise workflows could unlock $4 trillion/year in productivity gains. Despite being of interest to the data management community for decades, the ultimate vision of end-to-end workflow automation has remained elusive. Current solutions rely on process mining and robotic process automation (RPA), in which a bot is hard-coded to follow a set of predefined rules for completing a workflow. Through case studies of a hospital and large B2B enterprise, we find that the adoption of RPA has been inhibited by high set-up costs (12-18 months), unreliable execution (60% initial accuracy), and burdensome maintenance (requiring multiple FTEs). Multimodal foundation models (FMs) such as GPT-4 offer a promising new approach for end-to-end workflow automation given their generalized reasoning and planning abilities. To study these capabilities we propose ECLAIR, a system to automate enterprise workflows with minimal human supervision. We conduct initial experiments showing that multimodal FMs can address the limitations of traditional RPA with (1) near-human-level understanding of workflows (93% accuracy on a workflow understanding task) and (2) instant set-up with minimal technical barrier (based solely on a natural language description of a workflow, ECLAIR achieves end-to-end completion rates of 40%). We identify human-AI collaboration, validation, and self-improvement as open challenges, and suggest ways they can be solved with data management techniques. Code is available at:
Software Engineering,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the issue of enterprise workflow automation, particularly by overcoming the limitations of traditional Robotic Process Automation (RPA) methods through the use of multimodal foundation models such as GPT-4. Specifically, the paper identifies three main problems faced by RPA in practical applications: 1. **High Setup Costs**: Implementing RPA requires a significant amount of time and expertise, typically taking 12 to 18 months from project initiation to deployment. It also requires specialized technical personnel to write automation scripts and integrate with IT infrastructure. 2. **Execution Fragility**: Since RPA relies on hard-coded rules, it cannot adapt to subtle changes in input data, resulting in an initial accuracy rate of about 60%. It requires a long period of improvement to achieve a high accuracy rate (e.g., 95%). 3. **Heavy Maintenance Burden**: RPA systems often require continuous human supervision to verify output results and handle exceptions, adding extra costs and workload. To address these issues, the authors propose a new system called ECLAIR, which leverages multimodal foundation models to achieve end-to-end workflow automation. The main features of ECLAIR include: - **Demonstration**: Learning human workflow knowledge by watching video demonstrations and reading documents. - **Execution**: Planning and executing actions based on visual understanding and reasoning capabilities. - **Verification**: Utilizing the model for self-monitoring and error correction, reducing the need for human supervision. The paper presents preliminary experiments demonstrating the potential of ECLAIR in these areas and discusses the current challenges and future research directions.