SmartFlow: Robotic Process Automation using LLMs

Arushi Jain,Shubham Paliwal,Monika Sharma,Lovekesh Vig,Gautam Shroff

2024-05-21

Abstract:Robotic Process Automation (RPA) systems face challenges in handling complex processes and diverse screen layouts that require advanced human-like decision-making capabilities. These systems typically rely on pixel-level encoding through drag-and-drop or automation frameworks such as Selenium to create navigation workflows, rather than visual understanding of screen elements. In this context, we present SmartFlow, an AI-based RPA system that uses pre-trained large language models (LLMs) coupled with deep-learning based image understanding. Our system can adapt to new scenarios, including changes in the user interface and variations in input data, without the need for human intervention. SmartFlow uses computer vision and natural language processing to perceive visible elements on the graphical user interface (GUI) and convert them into a textual representation. This information is then utilized by LLMs to generate a sequence of actions that are executed by a scripting engine to complete an assigned task. To assess the effectiveness of SmartFlow, we have developed a dataset that includes a set of generic enterprise applications with diverse layouts, which we are releasing for research use. Our evaluations on this dataset demonstrate that SmartFlow exhibits robustness across different layouts and applications. SmartFlow can automate a wide range of business processes such as form filling, customer service, invoice processing, and back-office operations. SmartFlow can thus assist organizations in enhancing productivity by automating an even larger fraction of screen-based workflows. The demo-video and dataset are available at

Robotics,Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges faced by current Robotic Process Automation (RPA) systems when dealing with complex processes and diverse screen layouts. Existing RPA systems usually rely on pixel - level coding through drag - and - drop or automation frameworks (such as Selenium) to create navigation workflows, rather than visually understanding screen elements. The limitations of these systems lie in the lack of advanced human decision - making capabilities, making it difficult to adapt to changes in the user interface and handle tasks that require complex visual analysis and natural language understanding. To this end, the paper proposes a new AI - driven RPA system named SmartFlow. SmartFlow combines pre - trained large - scale language models (LLMs) and deep - learning - based image - understanding techniques, which can automatically identify and locate screen elements and generate navigation workflows using the information provided by HTML source code. This system can adapt to new scenarios without human intervention, including changes in the user interface and differences in input data. Through computer vision and natural language processing technologies, SmartFlow can perceive the visible elements on the graphical user interface (GUI), convert them into text representations, and then generate a series of action instructions by LLMs, which are finally executed by the script engine to complete the assigned tasks. This enables SmartFlow to show strong adaptability in different layouts and applications, thereby helping organizations improve productivity by automating a larger proportion of screen - based work processes.

SmartFlow: Robotic Process Automation using LLMs

FlowMind: Automatic Workflow Generation with LLMs

AutoFlow: Automated Workflow Generation for Large Language Model Agents

Automated Generation of Executable RPA Scripts from User Interface Logs

Optimizing Structured Data Processing through Robotic Process Automation

From Words to Workflows: Automating Business Processes

AFlow: Automating Agentic Workflow Generation

ProAgent: From Robotic Process Automation to Agentic Process Automation

Automating the Enterprise with Foundation Models

Flow as the Cross-Domain Manipulation Interface

Robotic Process Mining: Vision and Challenges

Automated Discovery of Data Transformations for Robotic Process Automation

Flowy: Supporting UX Design Decisions Through AI-Driven Pattern Annotation in Multi-Screen User Flows

A Goal-Driven Natural Language Interface for Creating Application Integration Workflows

WebRobot: Web Robotic Process Automation using Interactive Programming-by-Demonstration

CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only

Flowmind2Digital: The First Comprehensive Flowmind Recognition and Conversion Approach

AUTONODE: A Neuro-Graphic Self-Learnable Engine for Cognitive GUI Automation

E-Mail Assistant -- Automation of E-Mail Handling and Management using Robotic Process Automation

Enterprise Robotic Process Automation