Abstract:Purpose Current reinforcement learning (RL) algorithms are facing issues such as low learning efficiency and poor generalization performance, which significantly limit their practical application in real robots. This paper aims to adopt a hybrid model-based and model-free policy search method with multi-timescale value function tuning, aiming to allow robots to learn complex motion planning skills in multi-goal and multi-constraint environments with a few interactions. Design/methodology/approach A goal-conditioned model-based and model-free search method with multi-timescale value function tuning is proposed in this paper. First, the authors construct a multi-goal, multi-constrained policy optimization approach that fuses model-based policy optimization with goal-conditioned, model-free learning. Soft constraints on states and controls are applied to ensure fast and stable policy iteration. Second, an uncertainty-aware multi-timescale value function learning method is proposed, which constructs a multi-timescale value function network and adaptively chooses the value function planning timescales according to the value prediction uncertainty. It implicitly reduces the value representation complexity and improves the generalization performance of the policy. Findings The algorithm enables physical robots to learn generalized skills in real-world environments through a handful of trials. The simulation and experimental results show that the algorithm outperforms other relevant model-based and model-free RL algorithms. Originality/value This paper combines goal-conditioned RL and the model predictive path integral method into a unified model-based policy search framework, which improves the learning efficiency and policy optimality of motor skill learning in multi-goal and multi-constrained environments. An uncertainty-aware multi-timescale value function learning and selection method is proposed to overcome long horizon problems, improve optimal policy resolution and therefore enhance the generalization ability of goal-conditioned RL.

MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks

Learning Hierarchical Graph-Based Policy for Goal-Reaching in Unknown Environments

Goal-Reaching Policy Learning from Non-Expert Observations via Effective Subgoal Guidance

Policy Optimization with Smooth Guidance Learned from State-Only Demonstrations

Optimistic Multi-Agent Policy Gradient

GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models

Combining Hindsight with Goal-enhanced Prediction for Multi-goal Reinforcement Learning

SemGO: Goal-Oriented Semantic Policy Based on MHSA for Object Goal Navigation

Optimizing Latent Goal by Learning from Trajectory Preference

Online Guidance Graph Optimization for Lifelong Multi-Agent Path Finding

Trajectory-Oriented Policy Optimization with Sparse Rewards

CPIG: Leveraging Consistency Policy with Intention Guidance for Multi-agent Exploration

Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning

Adaptation Augmented Model-based Policy Optimization.

Model Gradient: Unified Model and Policy Learning in Model-Based Reinforcement Learning

Guided Goal Generation for Hindsight Multi-Goal Reinforcement Learning

Adaptive-Gradient Policy Optimization: Enhancing Policy Learning in Non-Smooth Differentiable Simulations

Landmark Guided Active Exploration with Stable Low-level Policy Learning

Policy Optimization with Model-based Explorations

A goal-conditioned policy search method with multi-timescale value function tuning

Generating Attentive Goals for Prioritized Hindsight Reinforcement Learning