Abstract:Robotic Process Automation (RPA) offers a valuable solution for efficiently automating tasks on the graphical user interface (GUI), by emulating human interactions, without modifying existing code. However, its broader adoption is constrained by the need for expertise in both scripting languages and workflow design. To address this challenge, we present PromptRPA, a system designed to comprehend various task-related textual prompts (e.g., goals, procedures), thereby generating and performing corresponding RPA tasks. PromptRPA incorporates a suite of intelligent agents that mimic human cognitive functions, specializing in interpreting user intent, managing external information for RPA generation, and executing operations on smartphones. The agents can learn from user feedback and continuously improve their performance based on the accumulated knowledge. Experimental results indicated a performance jump from a 22.28% success rate in the baseline to 95.21% with PromptRPA, requiring an average of 1.66 user interventions for each new task. PromptRPA presents promising applications in fields such as tutorial creation, smart assistance, and customer service.

What problem does this paper attempt to address?

The paper aims to address the limitations of Robotic Process Automation (RPA) technology in smartphone applications. Traditional RPA systems require users to have a certain level of programming knowledge and technical background to design automated task flows, which is a significant barrier for non-technical users. Additionally, the fixed operation sequences of these systems are difficult to adapt to constantly changing Graphical User Interfaces (GUI), making them prone to becoming outdated. To solve the above issues, the paper proposes the PromptRPA system. PromptRPA can accept text prompts provided by users in natural language and automatically generate corresponding RPA tasks. This system is specifically designed for smartphone environments and can understand various task-related text descriptions (such as goals, steps, etc.), thereby generating and executing corresponding RPA tasks. PromptRPA adopts a multi-agent framework, which includes multiple specialized agents to handle different stages of the task: 1. **Information Collection**: The analysis agent is responsible for extracting information from the text prompts and constructing a complete functional description; the retrieval agent is responsible for obtaining external knowledge, such as online tutorials, and combining it with the functional description to form a detailed step description. 2. **Instruction Generation**: The parsing agent converts the collected information into a series of standardized instructions. 3. **Operation Mapping**: The ground agent predicts and executes operations on the smartphone based on the generated instructions; the mobile semantic agent is responsible for extracting semantic information from the mobile interface to assist operation mapping; the evaluation agent ensures the accuracy and reliability of the predictions. Experimental results show that PromptRPA can significantly improve task success rates, increasing from a baseline of 22.28% to 95.21%, and on average, each new task only requires 1.66 user interventions. Moreover, with increased usage and intervention by users, the system's performance continues to improve. In summary, PromptRPA lowers the application threshold of RPA technology through a text prompt-driven approach, enabling non-technical users to easily automate tasks on smartphones. It has broad application prospects, especially in areas such as tutorial creation, intelligent assistance, and customer service.

PromptRPA: Generating Robotic Process Automation on Smartphones from Textual Prompts

Promptor: A Conversational and Autonomous Prompt Generation Agent for Intelligent Text Entry Techniques

WebRobot: Web Robotic Process Automation using Interactive Programming-by-Demonstration

CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only

Automated Generation of Executable RPA Scripts from User Interface Logs

A Conversational Digital Assistant for Intelligent Process Automation

PeriGuru: A Peripheral Robotic Mobile App Operation Assistant based on GUI Image Understanding and Prompting with LLM

Empowering LLM to use Smartphone for Intelligent Task Automation

DroidBot-GPT: GPT-powered UI Automation for Android

Cross-modal Task Understanding and Execution of Voice-fingertip Reading Instruction by Using Small Family Service Robotic

PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization

Requirements Engineering using Generative AI: Prompts and Prompting Patterns

VisionTasker: Mobile Task Automation Using Vision Based UI Understanding and LLM Task Planning

Automatically Generating and Improving Voice Command Interface from Operation Sequences on Smartphones

PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Heuristic-based Sampling

AutoDroid: LLM-powered Task Automation in Android

GPTVoiceTasker: Advancing Multi-step Mobile Task Efficiency Through Dynamic Interface Exploration and Learning

What Should We Engineer in Prompts? Training Humans in Requirement-Driven LLM Use

ProAgent: From Robotic Process Automation to Agentic Process Automation

Robotic Process Mining: Vision and Challenges