Abstract:Autonomous mobile app interaction has become increasingly important with growing complexity of mobile applications. Developing intelligent agents that can effectively navigate and interact with mobile apps remains a significant challenge. In this paper, we propose an Explainable Behavior Cloning LLM Agent (EBC-LLMAgent), a novel approach that combines large language models (LLMs) with behavior cloning by learning demonstrations to create intelligent and explainable agents for autonomous mobile app interaction. EBC-LLMAgent consists of three core modules: Demonstration Encoding, Code Generation, and UI Mapping, which work synergistically to capture user demonstrations, generate executable codes, and establish accurate correspondence between code and UI elements. We introduce the Behavior Cloning Chain Fusion technique to enhance the generalization capabilities of the agent. Extensive experiments on five popular mobile applications from diverse domains demonstrate the superior performance of EBC-LLMAgent, achieving high success rates in task completion, efficient generalization to unseen scenarios, and the generation of meaningful explanations.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to develop intelligent agents that can interact with mobile applications autonomously. With the increasing complexity of mobile applications, the need for such agents is becoming more and more important. However, traditional mobile application automation methods usually rely on manually - written rules and heuristics, which are not only labor - intensive but also difficult to adapt to new scenarios and application updates. In addition, these traditional methods also have limitations in terms of interpretability and generalization ability. For this reason, the paper proposes a new method named "Explainable Behavior Cloning LLM Agent (EBC - LLMAgent)". By combining large - language models (LLMs) and behavior - cloning techniques learned through demonstrations, EBC - LLMAgent aims to create intelligent and interpretable agents that can navigate and interact with mobile applications autonomously. This method not only reduces the need for manual intervention but also improves user productivity, and at the same time enhances user trust by providing transparent explanations to promote seamless human - machine collaboration. Specifically, the main contributions of EBC - LLMAgent include: 1. **Proposing EBC - LLMAgent**: A method that combines LLMs and behavior - cloning techniques learned through demonstrations, enabling agents to learn from user demonstrations and generalize to unseen tasks. 2. **Modular architecture**: It includes three core components - demonstration encoding, code generation, and UI mapping. These components work together to capture user demonstrations, generate executable code fragments, and establish an accurate correspondence between the code and UI elements. 3. **Behavior - cloning - chain - fusion technique**: By learning from multiple demonstrations and merging the learned behaviors into a coherent and flexible interaction model, the generalization ability of the agent is enhanced. Through these innovations, EBC - LLMAgent can achieve a high task - completion rate, efficient generalization ability, and meaningful explanation generation in multiple mobile applications.

Explainable Behavior Cloning: Teaching Large Language Model Agents through Learning by Demonstration

Explaining Agent Behavior with Large Language Models

Never-Ending Behavior-Cloning Agent for Robotic Manipulation

Behavioral Cloning via Search in Embedded Demonstration Dataset

AppAgent v2: Advanced Agent for Flexible Mobile Interactions

Experiential Co-Learning of Software-Developing Agents

Enabling Conversational Interaction with Mobile UI using Large Language Models

MobileAgent: enhancing mobile control via human-machine interaction and SOP integration

AppAgent: Multimodal Agents as Smartphone Users

User Behavior Simulation with Large Language Model based Agents

MobileExperts: A Dynamic Tool-Enabled Agent Team in Mobile Devices

Deploying and Evaluating LLMs to Program Service Mobile Robots

Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents

MobA: A Two-Level Agent System for Efficient Mobile Task Automation

Cocobo: Exploring Large Language Models as the Engine for End-User Robot Programming

LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation

Using Advanced LLMs to Enhance Smaller LLMs: An Interpretable Knowledge Distillation Approach

Aligning Agents like Large Language Models

VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought

Large Language Model-Brained GUI Agents: A Survey

Comprehensive Cognitive LLM Agent for Smartphone GUI Automation