MobA: A Two-Level Agent System for Efficient Mobile Task Automation

Zichen Zhu,Hao Tang,Yansi Li,Kunyao Lan,Yixuan Jiang,Hao Zhou,Yixiao Wang,Situo Zhang,Liangtai Sun,Lu Chen,Kai Yu
2024-10-18
Abstract:Current mobile assistants are limited by dependence on system APIs or struggle with complex user instructions and diverse interfaces due to restricted comprehension and decision-making abilities. To address these challenges, we propose MobA, a novel Mobile phone Agent powered by multimodal large language models that enhances comprehension and planning capabilities through a sophisticated two-level agent architecture. The high-level Global Agent (GA) is responsible for understanding user commands, tracking history memories, and planning tasks. The low-level Local Agent (LA) predicts detailed actions in the form of function calls, guided by sub-tasks and memory from the GA. Integrating a Reflection Module allows for efficient task completion and enables the system to handle previously unseen complex tasks. MobA demonstrates significant improvements in task execution efficiency and completion rate in real-life evaluations, underscoring the potential of MLLM-empowered mobile assistants.
Multiagent Systems,Artificial Intelligence,Computation and Language,Human-Computer Interaction
What problem does this paper attempt to address?