Abstract:Existing policy learning methods predominantly adopt the task-centric paradigm, necessitating the collection of task data in an end-to-end manner. Consequently, the learned policy tends to fail to tackle novel tasks. Moreover, it is hard to localize the errors for a complex task with multiple stages due to end-to-end learning. To address these challenges, we propose RoboMatrix, a skill-centric and hierarchical framework for scalable task planning and execution. We first introduce a novel skill-centric paradigm that extracts the common meta-skills from different complex tasks. This allows for the capture of embodied demonstrations through a kill-centric approach, enabling the completion of open-world tasks by combining learned meta-skills. To fully leverage meta-skills, we further develop a hierarchical framework that decouples complex robot tasks into three interconnected layers: (1) a high-level modular scheduling layer; (2) a middle-level skill layer; and (3) a low-level hardware layer. Experimental results illustrate that our skill-centric and hierarchical framework achieves remarkable generalization performance across novel objects, scenes, tasks, and embodiments. This framework offers a novel solution for robot task planning and execution in open-world scenarios. Our software and hardware are available at <a class="link-external link-https" href="https://github.com/WayneMao/RoboMatrix" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve several key problems in existing robot task planning and execution methods: 1. **Low data collection efficiency**: Existing task - based methods need to collect end - to - end data for each new task, which is very time - consuming and resource - intensive when dealing with complex tasks. 2. **Poor generalization ability for new tasks**: Due to the limitations of end - to - end learning, these methods perform poorly when facing unseen new tasks and cannot generate new action sequences. 3. **Difficult error location**: Due to the black - box nature of end - to - end learning, it is difficult to determine at which stage the error occurs, especially in multi - stage complex tasks. To overcome these problems, the paper proposes **RoboMatrix**, a skill - centered hierarchical framework for scalable robot task planning and execution in the open world. This framework improves data collection efficiency, task generalization ability, and the convenience of error location by extracting common meta - skills in different complex tasks and combining them to complete new tasks. ### Main contributions 1. **Introduced a skill - centered hierarchical framework**: This framework can achieve scalable robot task planning and execution in open - world scenarios. 2. **Proposed a unified vision - language - action (VLA) model**: This model can perform both robot movement and manipulation tasks simultaneously. 3. **Demonstrated strong generalization ability on new objects, scenes, tasks, and robot morphologies**. ### Method overview 1. **Skill - centered pipeline**: - **Meta - skill extraction**: Extract common meta - skills from different complex tasks and build a skill matrix. - **Skill database**: Continuously optimize and expand the skill database by collecting and organizing skill data. 2. **Skill model**: - **Vision - language - action (VLA) model**: Based on pre - trained language models (such as Vicuna 1.5), combined with a visual encoder and an action generation module, to achieve end - to - end task execution. - **Hybrid model**: Used to handle tasks in unstructured environments, such as object grasping and searching, combining traditional control methods (such as PD control) and modern detection algorithms (such as YOLOWorld). 3. **RoboMatrix framework**: - **Modular scheduling layer**: Responsible for decomposing complex tasks into sub - task sequences and scheduling execution according to the feedback of the skill model. - **Skill layer**: Map sub - task descriptions to specific robot actions, including stop signals to determine whether the current sub - task is completed. - **Hardware layer**: Manage the robot's controller and state observer, convert actions into control signals, and update the robot's state and image in real - time. ### Experimental results The paper verified the effectiveness of the RoboMatrix framework through a series of experiments: 1. **Meta - skill performance evaluation**: A comprehensive evaluation was carried out on eight meta - skills, and the results showed that the model performed well on both seen and unseen objects and scenes. 2. **Task - level generalization performance**: Through a five - level generalization evaluation protocol, the generalization ability of the model on tasks and scenes of different difficulties was verified. The results showed that the skill - centered method was significantly superior to the task - centered method when dealing with complex tasks. 3. **Cross - robot - morphology generalization**: The model was directly deployed on different types of robots to verify its adaptability on new robots. ### Conclusion By introducing a skill - centered hierarchical framework, RoboMatrix effectively solves the deficiencies of existing methods in data collection efficiency, task generalization ability, and error location, providing a new solution for robot task planning and execution in the open world.

RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World

GSC: A Graph-Based Skill Composition Framework for Robot Learning

Leveraging the Efficiency of Multi-Task Robot Manipulation Via Task-Evoked Planner and Reinforcement Learning

Search-Based Task Planning with Learned Skill Effect Models for Lifelong Robotic Manipulation

Scalable Hierarchical Reinforcement Learning for Hyper Scale Multi-Robot Task Planning

Skill Transformer: A Monolithic Policy for Mobile Manipulation

Skill-based Multi-objective Reinforcement Learning of Industrial Robot Tasks with Planning and Knowledge Integration

Robot Task Planning and Situation Handling in Open Worlds

Human-Aware Robot Task Planning Based on a Hierarchical Task Model

Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks

C3F: Constant Collaboration and Communication Framework for Graph-Representation Dynamic Multi-Robotic Systems

NeuronsGym: A Hybrid Framework and Benchmark for Robot Tasks with Sim2Real Policy Learning

An Enhanced Hierarchical Planning Framework for Multi-Robot Autonomous Exploration

MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale

A Novel Skill Learning Framework for Redundant Manipulators Based on Multi-task Dynamic Movement Primitives

A Framework to Co-Optimize Robot Exploration and Task Planning in Unknown Environments

Scaling simulation-to-real transfer by learning composable robot skills

Real-World Robot Reaching Skill Learning Based On Deep Reinforcement Learning

MaestROB: A Robotics Framework for Integrated Orchestration of Low-Level Control and High-Level Reasoning