Explainable Behavior Cloning: Teaching Large Language Model Agents through Learning by Demonstration

Yanchu Guan,Dong Wang,Yan Wang,Haiqing Wang,Renen Sun,Chenyi Zhuang,Jinjie Gu,Zhixuan Chu
2024-10-30
Abstract:Autonomous mobile app interaction has become increasingly important with growing complexity of mobile applications. Developing intelligent agents that can effectively navigate and interact with mobile apps remains a significant challenge. In this paper, we propose an Explainable Behavior Cloning LLM Agent (EBC-LLMAgent), a novel approach that combines large language models (LLMs) with behavior cloning by learning demonstrations to create intelligent and explainable agents for autonomous mobile app interaction. EBC-LLMAgent consists of three core modules: Demonstration Encoding, Code Generation, and UI Mapping, which work synergistically to capture user demonstrations, generate executable codes, and establish accurate correspondence between code and UI elements. We introduce the Behavior Cloning Chain Fusion technique to enhance the generalization capabilities of the agent. Extensive experiments on five popular mobile applications from diverse domains demonstrate the superior performance of EBC-LLMAgent, achieving high success rates in task completion, efficient generalization to unseen scenarios, and the generation of meaningful explanations.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to develop intelligent agents that can interact with mobile applications autonomously. With the increasing complexity of mobile applications, the need for such agents is becoming more and more important. However, traditional mobile application automation methods usually rely on manually - written rules and heuristics, which are not only labor - intensive but also difficult to adapt to new scenarios and application updates. In addition, these traditional methods also have limitations in terms of interpretability and generalization ability. For this reason, the paper proposes a new method named "Explainable Behavior Cloning LLM Agent (EBC - LLMAgent)". By combining large - language models (LLMs) and behavior - cloning techniques learned through demonstrations, EBC - LLMAgent aims to create intelligent and interpretable agents that can navigate and interact with mobile applications autonomously. This method not only reduces the need for manual intervention but also improves user productivity, and at the same time enhances user trust by providing transparent explanations to promote seamless human - machine collaboration. Specifically, the main contributions of EBC - LLMAgent include: 1. **Proposing EBC - LLMAgent**: A method that combines LLMs and behavior - cloning techniques learned through demonstrations, enabling agents to learn from user demonstrations and generalize to unseen tasks. 2. **Modular architecture**: It includes three core components - demonstration encoding, code generation, and UI mapping. These components work together to capture user demonstrations, generate executable code fragments, and establish an accurate correspondence between the code and UI elements. 3. **Behavior - cloning - chain - fusion technique**: By learning from multiple demonstrations and merging the learned behaviors into a coherent and flexible interaction model, the generalization ability of the agent is enhanced. Through these innovations, EBC - LLMAgent can achieve a high task - completion rate, efficient generalization ability, and meaningful explanation generation in multiple mobile applications.