Octo-planner: On-device Language Model for Planner-Action Agents

Wei Chen,Zhiyuan Li,Zhen Guo,Yikang Shen
2024-06-26
Abstract:AI agents have become increasingly significant in various domains, enabling autonomous decision-making and problem-solving. To function effectively, these agents require a planning process that determines the best course of action and then executes the planned actions. In this paper, we present an efficient on-device Planner-Action framework that separates planning and action execution into two distinct components: a planner agent based on Phi-3 Mini, a 3.8 billion parameter LLM optimized for edge devices, and an action agent using the Octopus model for function execution. The planner agent first responds to user queries by decomposing tasks into a sequence of sub-steps, which are then executed by the action agent. To optimize performance on resource-constrained devices, we employ model fine-tuning instead of in-context learning, reducing computational costs and energy consumption while improving response times. Our approach involves using GPT-4 to generate diverse planning queries and responses based on available functions, with subsequent validations to ensure data quality. We fine-tune the Phi-3 Mini model on this curated dataset, achieving a 97\% success rate in our in-domain test environment. To address multi-domain planning challenges, we developed a multi-LoRA training method that merges weights from LoRAs trained on distinct function subsets. This approach enables flexible handling of complex, multi-domain queries while maintaining computational efficiency on resource-constrained devices. To support further research, we have open-sourced our model weights at \url{<a class="link-external link-https" href="https://huggingface.co/NexaAIDev/octopus-planning" rel="external noopener nofollow">this https URL</a>}. For the demo, please refer to \url{<a class="link-external link-https" href="https://www.nexa4ai.com/octo-planner" rel="external noopener nofollow">this https URL</a>}.
Computation and Language,Human-Computer Interaction
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to develop an efficient planning - action framework suitable for resource - constrained devices (such as smart phones) in order to achieve more intelligent and efficient autonomous decision - making and problem - solving. Specifically, the author proposes a localized language model named Octo - planner, aiming to optimize task planning and execution on edge devices. ### Main Problems 1. **Efficient Planning under Resource Constraints**: Although existing large - language models (LLMs) perform well in complex task planning, they usually require powerful computing resources and high energy consumption and are difficult to be directly deployed on resource - constrained edge devices. Therefore, how to reduce computing costs and energy consumption while ensuring performance is a key issue. 2. **Multi - domain Task Processing**: Tasks in the real world are often cross - multiple domains, and a single pre - trained model may not be able to handle these complex multi - domain queries well. Therefore, how to make the model flexibly respond to tasks in different domains while maintaining computational efficiency is also a challenge. 3. **Real - time and Privacy Protection**: Many application scenarios require real - time processing and offline functions. Especially when sensitive data is involved, users prefer to use localized AI agents to ensure data security and privacy. ### Solutions To address the above problems, the author proposes the following solutions: 1. **Separate Planning and Action**: Separate task planning and action execution into two independent modules, namely the Planner Agent and the Action Agent. The Planner Agent is responsible for decomposing user queries into a series of sub - steps, while the Action Agent is responsible for executing these sub - steps. This modular design not only improves the extensibility and interpretability of the system but also enables each module to be optimized for its specific tasks. 2. **Model Fine - tuning instead of Context Learning**: Compared with the traditional context - learning method, the author chooses to fine - tune the Phi - 3 Mini model. By using GPT - 4 to generate diverse planning queries and responses and verify them, and then use these high - quality data sets to fine - tune the model. This method reduces computing overhead, reduces the need for key - value caches, and thus improves the response speed. 3. **Multi - LoRA Training Method**: In order to handle multi - domain tasks, the author has developed a multi - LoRA (Low - Rank Adaptation) training method, which combines the LoRA weights obtained from different function sets into the same base model. This enables a single model to flexibly handle complex multi - domain queries on resource - constrained devices while maintaining computational efficiency. 4. **Open - source Model Weights**: In order to support further research and application, the author has open - sourced the model weights, encouraging more developers and researchers to participate in the work in this field. ### Summary Through these innovations, Octo - planner not only improves the efficiency of task planning and execution on resource - constrained devices but also solves the problems of multi - domain task processing, real - time, and privacy protection, laying the foundation for more intelligent and practical AI agents in the future.