Abstract:The naive application of Reinforcement Learning algorithms to continuous control problems -- such as locomotion and manipulation -- often results in policies which rely on high-amplitude, high-frequency control signals, known colloquially as bang-bang control. Although such solutions may indeed maximize task reward, they can be unsuitable for real world systems. Bang-bang control may lead to increased wear and tear or energy consumption, and tends to excite undesired second-order dynamics. To counteract this issue, multi-objective optimization can be used to simultaneously optimize both the reward and some auxiliary cost that discourages undesired (e.g. high-amplitude) control. In principle, such an approach can yield the sought after, smooth, control policies. It can, however, be hard to find the correct trade-off between cost and return that results in the desired behavior. In this paper we propose a new constraint-based reinforcement learning approach that ensures task success while minimizing one or more auxiliary costs (such as control effort). We employ Lagrangian relaxation to learn both (a) the parameters of a control policy that satisfies the desired constraints and (b) the Lagrangian multipliers for the optimization. Moreover, we demonstrate that we can satisfy constraints either in expectation or in a per-step fashion, and can even learn a single policy that is able to dynamically trade-off between return and cost. We demonstrate the efficacy of our approach using a number of continuous control benchmark tasks, a realistic, energy-optimized quadruped locomotion task, as well as a reaching task on a real robot arm.

Improving the performance of Learned Controllers in Behavior Trees using Value Function Estimates at Switching Boundaries

Improving the performance of Learned Controllers in Behavior Trees using Value Function Estimates at Switching Boundaries

An Extended Convergence Result for Behaviour Tree Controllers

Behavior Trees in Robot Control Systems

Learning with Training Wheels: Speeding up Training with a Simple Controller for Deep Reinforcement Learning

Behavior Trees in Robotics and AI: An Introduction

On the programming effort required to generate Behavior Trees and Finite State Machines for robotic applications

Reconfigurable Behavior Trees: Towards an Executive Framework Meeting High-level Decision Making and Control Layer Features

Towards Using Behavior Trees in Industrial Automation Controllers

Model predictive control-based value estimation for efficient reinforcement learning

Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection

Simulation Results on Selector Adaptation in Behavior Trees

dtControl: Decision Tree Learning Algorithms for Controller Representation

Optimized Execution of PDDL Plans using Behavior Trees

Blending MPC & Value Function Approximation for Efficient Reinforcement Learning

Value constrained model-free continuous control

Efficient safe learning for controller tuning with experimental validation

Learning-Based Optimal Control with Performance Guarantees for Unknown Systems with Latent States

Near Optimal Behavior via Approximate State Abstraction

Adaptive Manipulation using Behavior Trees

Learning and Executing Re-usable Behaviour Trees from Natural Language Instruction