Abstract:Reinforcement learning recently has achieved impressive success in allowing robots to learn complex motor skills in simulation environments. However, most of these successes are difficult to transfer to physical robots since current algorithms require lots of practical training and complex sim-to-real transfer skills. To improve the learning efficiency and adaptability of physical robots, this article proposes a guided model-based policy search (GMBPS) algorithm inspired by a hypothetical model-free (MF) and model-based (MB) actor-critic brain implementation. This approach bridges the gap between MF and MB control processes, overcoming the suboptimality of MB methods and speeding up the learning rate of MF methods. Additionally, a one-step predictive control framework is proposed for minimizing the impact of delayed sensorimotor information in real-world tasks. This helps to accurately control the action cycle time and ensures the feasibility of MB planning for physical robots. The simulation and experimental results demonstrate that the proposed approach enables a 6-DOF UR5e robot arm to learn various reaching tasks in a few minutes with better policies and higher learning efficiency. Note to Practitioners —Reinforcement learning is becoming a popular framework that allows robots to learn complex motor skills without building analytical models of controlled plants. However, low learning efficiency severely limits its application in practical robots, where robots have to quickly adapt to dynamically changing environments in micro-data situations. To solve the inefficiency problem of physical robot learning from scratch, this paper proposes a MF and MB fusion control algorithm inspired by a hypothetical MF and MB actor-critic brain implementation. The motion decision process is modeled as an optimization problem with inequality constraints. The global MF value function is incorporated into the MB objective function, extending the short-term optimization into a long-term version to overcome the suboptimality of conventional MB methods. The MB policy is searched based on the quadratic penalty method with the guide of the MF policy, which helps improve the quality of policy at every decision-making step. Moreover, since the model dynamics is fitted by a probabilistic neural network, the proposed method is not only applicable to joint-driven robots but also provides a feasible solution for the control of various robotic systems with complex dynamics, such as soft robots and musculoskeletal robots.

Learning a Set of Interrelated Tasks by Using Sequences of Motor Policies for a Strategic Intrinsically Motivated Learner

Autonomous Open-Ended Learning of Tasks with Non-Stationary Interdependencies

Socially Guided Intrinsic Motivation for Robot Learning of Motor Skills

Active Learning of Inverse Models with Intrinsically Motivated Goal Exploration in Robots

Practice Makes Perfect: Planning to Learn Skill Parameter Policies

Behavior policy learning: Learning multi-stage tasks via solution sketches and model-based controllers

Modeling Long-horizon Tasks as Sequential Interaction Landscapes

Intrinsically Motivated Hierarchical Policy Learning in Multi-objective Markov Decision Processes

Learning to combine primitive skills: A step towards versatile robotic manipulation

Learning to Sequence Robot Behaviors for Visual Navigation

Guided Model-Based Policy Search Method for Fast Motor Learning of Robots with Learned Dynamics

Concept2Robot: Learning Manipulation Concepts from Instructions and Human Demonstrations

Autonomously Achieving Bipedal Locomotion Skill Via Hierarchical Motion Modelling.

Active Learning of Abstract Plan Feasibility

Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning

Learning Skills to Navigate without a Master: A Sequential Multi-Policy Reinforcement Learning Algorithm

Hierarchical and Parameterized Learning of Pick-and-place Manipulation from Under-Specified Human Demonstrations

Incremental procedural and sensorimotor learning in cognitive humanoid robots

Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks

Language-Conditioned Imitation Learning for Robot Manipulation Tasks

Guided Imitation of Task and Motion Planning