PromptRPA: Generating Robotic Process Automation on Smartphones from Textual Prompts

Tian Huang,Chun Yu,Weinan Shi,Zijian Peng,David Yang,Weiqi Sun,Yuanchun Shi
2024-04-03
Abstract:Robotic Process Automation (RPA) offers a valuable solution for efficiently automating tasks on the graphical user interface (GUI), by emulating human interactions, without modifying existing code. However, its broader adoption is constrained by the need for expertise in both scripting languages and workflow design. To address this challenge, we present PromptRPA, a system designed to comprehend various task-related textual prompts (e.g., goals, procedures), thereby generating and performing corresponding RPA tasks. PromptRPA incorporates a suite of intelligent agents that mimic human cognitive functions, specializing in interpreting user intent, managing external information for RPA generation, and executing operations on smartphones. The agents can learn from user feedback and continuously improve their performance based on the accumulated knowledge. Experimental results indicated a performance jump from a 22.28% success rate in the baseline to 95.21% with PromptRPA, requiring an average of 1.66 user interventions for each new task. PromptRPA presents promising applications in fields such as tutorial creation, smart assistance, and customer service.
Human-Computer Interaction
What problem does this paper attempt to address?
The paper aims to address the limitations of Robotic Process Automation (RPA) technology in smartphone applications. Traditional RPA systems require users to have a certain level of programming knowledge and technical background to design automated task flows, which is a significant barrier for non-technical users. Additionally, the fixed operation sequences of these systems are difficult to adapt to constantly changing Graphical User Interfaces (GUI), making them prone to becoming outdated. To solve the above issues, the paper proposes the PromptRPA system. PromptRPA can accept text prompts provided by users in natural language and automatically generate corresponding RPA tasks. This system is specifically designed for smartphone environments and can understand various task-related text descriptions (such as goals, steps, etc.), thereby generating and executing corresponding RPA tasks. PromptRPA adopts a multi-agent framework, which includes multiple specialized agents to handle different stages of the task: 1. **Information Collection**: The analysis agent is responsible for extracting information from the text prompts and constructing a complete functional description; the retrieval agent is responsible for obtaining external knowledge, such as online tutorials, and combining it with the functional description to form a detailed step description. 2. **Instruction Generation**: The parsing agent converts the collected information into a series of standardized instructions. 3. **Operation Mapping**: The ground agent predicts and executes operations on the smartphone based on the generated instructions; the mobile semantic agent is responsible for extracting semantic information from the mobile interface to assist operation mapping; the evaluation agent ensures the accuracy and reliability of the predictions. Experimental results show that PromptRPA can significantly improve task success rates, increasing from a baseline of 22.28% to 95.21%, and on average, each new task only requires 1.66 user interventions. Moreover, with increased usage and intervention by users, the system's performance continues to improve. In summary, PromptRPA lowers the application threshold of RPA technology through a text prompt-driven approach, enabling non-technical users to easily automate tasks on smartphones. It has broad application prospects, especially in areas such as tutorial creation, intelligent assistance, and customer service.