Abstract:AI agents have become increasingly significant in various domains, enabling autonomous decision-making and problem-solving. To function effectively, these agents require a planning process that determines the best course of action and then executes the planned actions. In this paper, we present an efficient on-device Planner-Action framework that separates planning and action execution into two distinct components: a planner agent based on Phi-3 Mini, a 3.8 billion parameter LLM optimized for edge devices, and an action agent using the Octopus model for function execution. The planner agent first responds to user queries by decomposing tasks into a sequence of sub-steps, which are then executed by the action agent. To optimize performance on resource-constrained devices, we employ model fine-tuning instead of in-context learning, reducing computational costs and energy consumption while improving response times. Our approach involves using GPT-4 to generate diverse planning queries and responses based on available functions, with subsequent validations to ensure data quality. We fine-tune the Phi-3 Mini model on this curated dataset, achieving a 97\% success rate in our in-domain test environment. To address multi-domain planning challenges, we developed a multi-LoRA training method that merges weights from LoRAs trained on distinct function subsets. This approach enables flexible handling of complex, multi-domain queries while maintaining computational efficiency on resource-constrained devices. To support further research, we have open-sourced our model weights at \url{<a class="link-external link-https" href="https://huggingface.co/NexaAIDev/octopus-planning" rel="external noopener nofollow">this https URL</a>}. For the demo, please refer to \url{<a class="link-external link-https" href="https://www.nexa4ai.com/octo-planner" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to develop an efficient planning - action framework suitable for resource - constrained devices (such as smart phones) in order to achieve more intelligent and efficient autonomous decision - making and problem - solving. Specifically, the author proposes a localized language model named Octo - planner, aiming to optimize task planning and execution on edge devices. ### Main Problems 1. **Efficient Planning under Resource Constraints**: Although existing large - language models (LLMs) perform well in complex task planning, they usually require powerful computing resources and high energy consumption and are difficult to be directly deployed on resource - constrained edge devices. Therefore, how to reduce computing costs and energy consumption while ensuring performance is a key issue. 2. **Multi - domain Task Processing**: Tasks in the real world are often cross - multiple domains, and a single pre - trained model may not be able to handle these complex multi - domain queries well. Therefore, how to make the model flexibly respond to tasks in different domains while maintaining computational efficiency is also a challenge. 3. **Real - time and Privacy Protection**: Many application scenarios require real - time processing and offline functions. Especially when sensitive data is involved, users prefer to use localized AI agents to ensure data security and privacy. ### Solutions To address the above problems, the author proposes the following solutions: 1. **Separate Planning and Action**: Separate task planning and action execution into two independent modules, namely the Planner Agent and the Action Agent. The Planner Agent is responsible for decomposing user queries into a series of sub - steps, while the Action Agent is responsible for executing these sub - steps. This modular design not only improves the extensibility and interpretability of the system but also enables each module to be optimized for its specific tasks. 2. **Model Fine - tuning instead of Context Learning**: Compared with the traditional context - learning method, the author chooses to fine - tune the Phi - 3 Mini model. By using GPT - 4 to generate diverse planning queries and responses and verify them, and then use these high - quality data sets to fine - tune the model. This method reduces computing overhead, reduces the need for key - value caches, and thus improves the response speed. 3. **Multi - LoRA Training Method**: In order to handle multi - domain tasks, the author has developed a multi - LoRA (Low - Rank Adaptation) training method, which combines the LoRA weights obtained from different function sets into the same base model. This enables a single model to flexibly handle complex multi - domain queries on resource - constrained devices while maintaining computational efficiency. 4. **Open - source Model Weights**: In order to support further research and application, the author has open - sourced the model weights, encouraging more developers and researchers to participate in the work in this field. ### Summary Through these innovations, Octo - planner not only improves the efficiency of task planning and execution on resource - constrained devices but also solves the problems of multi - domain task processing, real - time, and privacy protection, laying the foundation for more intelligent and practical AI agents in the future.

Octo-planner: On-device Language Model for Planner-Action Agents

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner

AutoAct: Automatic Agent Learning from Scratch for QA Via Self-Planning

AdaPlanner: Adaptive Planning from Feedback with Language Models

CAMPHOR: Collaborative Agents for Multi-input Planning and High-Order Reasoning On Device

Octopus v2: On-device language model for super agent

ReasonPlanner: Enhancing Autonomous Planning in Dynamic Environments with Temporal Knowledge Graphs and LLMs

LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models

On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability

PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning

Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent

One STEP at a time: Language Agents are Stepwise Planners

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Dynamic Planning for LLM-based Graphical User Interface Automation

Agent-Oriented Planning in Multi-Agent Systems

Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents.

Interactive Speculative Planning: Enhance Agent Efficiency through Co-design of System and User Interface

Tool-Planner: Task Planning with Clusters across Multiple Tools

Learning adaptive planning representations with natural language guidance