Abstract:The ability to autonomously explore and resolve tasks with minimal human guidance is crucial for the self-development of embodied intelligence. Although reinforcement learning methods can largely ease human effort, it's challenging to design reward functions for real-world tasks, especially for high-dimensional robotic control, due to complex relationships among joints and tasks. Recent advancements large language models (LLMs) enable automatic reward function design. However, approaches evaluate reward functions by re-training policies from scratch placing an undue burden on the reward function, expecting it to be effective throughout the whole policy improvement process. We argue for a more practical strategy in robotic autonomy, focusing on refining existing policies with policy-dependent reward functions rather than a universal one. To this end, we propose a novel reward-policy co-evolution framework where the reward function and the learned policy benefit from each other's progressive on-the-fly improvements, resulting in more efficient and higher-performing skill acquisition. Specifically, the reward evolution process translates the robot's previous best reward function, descriptions of tasks and environment into text inputs. These inputs are used to query LLMs to generate a dynamic amount of reward function candidates, ensuring continuous improvement at each round of evolution. For policy evolution, our method generates new policy populations by hybridizing historically optimal and random policies. Through an improved Bayesian optimization, our approach efficiently and robustly identifies the most capable and plastic reward-policy combination, which then proceeds to the next round of co-evolution. Despite using less data, our approach demonstrates an average normalized improvement of 95.3% across various high-dimensional robotic skill learning tasks.

Curriculum Reinforcement Learning via Morphology-Environment Co-Evolution

Evolving Curricula with Regret-Based Environment Design

Evolving Reservoirs for Meta Reinforcement Learning

EAT-C: Environment-Adversarial sub-Task Curriculum for Efficient Reinforcement Learning.

EAT-C: Environment-Adversarial sub-Task Curriculum for RL

Combining a Gradient-Based Method and an Evolution Strategy for Multi-Objective Reinforcement Learning.

Non-local Policy Optimization via Diversity-regularized Collaborative Exploration

Continual Multi-Objective Reinforcement Learning Via Reward Model Rehearsal

Embodied intelligence via learning and evolution

Efficient Language-instructed Skill Acquisition via Reward-Policy Co-Evolution

BiERL: A Meta Evolutionary Reinforcement Learning Framework via Bilevel Optimization

Evolutionary Reinforcement Learning via Cooperative Coevolution

Continual Learning for Morphology Control

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

Adaptive Evolutionary Reinforcement Learning with Policy Direction

Two-Stage Evolutionary Reinforcement Learning for Enhancing Exploration and Exploitation

Goal-oriented Knowledge Reuse Via Curriculum Evolution for Reinforcement Learning-based Adaptation

An Evolutionary Transfer Reinforcement Learning Framework for Multiagent Systems.

MER: Modular Element Randomization for Robust Generalizable Policy in Deep Reinforcement Learning

Improving Generalization in Reinforcement Learning Training Regimes for Social Robot Navigation

Subequivariant Graph Reinforcement Learning in 3D Environments