Abstract:Complex planning and scheduling problems have long been solved using various optimization or heuristic approaches. In recent years, imitation learning that aims to learn from expert demonstrations has been proposed as a viable alternative to solving these problems. Generally speaking, imitation learning is designed to learn either the reward (or preference) model or directly the behavioral policy by observing the behavior of an expert. Existing work in imitation learning and inverse reinforcement learning has focused on imitation primarily in unconstrained settings (e.g., no limit on fuel consumed by the vehicle). However, in many real-world domains, the behavior of an expert is governed not only by reward (or preference) but also by constraints. For instance, decisions on self-driving delivery vehicles are dependent not only on the route preferences/rewards (depending on past demand data) but also on the fuel in the vehicle and the time available. In such problems, imitation learning is challenging as decisions are not only dictated by the reward model but are also dependent on a cost-constrained model. In this paper, we provide multiple methods that match expert distributions in the presence of trajectory cost constraints through (a) Lagrangian-based method; (b) Meta-gradients to find a good trade-off between expected return and minimizing constraint violation; and (c) Cost-violation-based alternating gradient. We empirically show that leading imitation learning approaches imitate cost-constrained behaviors poorly and our meta-gradient-based approach achieves the best performance.

Approximate Inverse Reinforcement Learning from Vision-based Imitation Learning

Imitation Learning of Hierarchical Driving Model: from Continuous Intention to Continuous Trajectory

Spatiotemporal Costmap Inference for MPC via Deep Inverse Reinforcement Learning

Inverse reinforcement learning for autonomous navigation via differentiable semantic mapping and planning

Learning Navigation Costs from Demonstration with Semantic Observations

Visual Hindsight Self-Imitation Learning for Interactive Navigation

Inverse Model Predictive Control: Learning Optimal Control Cost Functions for MPC

Aggressive Deep Driving: Model Predictive Control with a CNN Cost Model

VOILA: Visual-Observation-Only Imitation Learning for Autonomous Navigation

Evaluation of MPC-based Imitation Learning for Human-like Autonomous Driving

Imitating Cost-Constrained Behaviors in Reinforcement Learning

Towards navigation without precise localization: Weakly supervised learning of goal-directed navigation cost map

How Imitation Learning and Human Factors Can Be Combined in a Model Predictive Control Algorithm for Adaptive Motion Planning and Control

MPC-based Imitation Learning for Safe and Human-like Autonomous Driving

Towards Target-Driven Visual Navigation in Indoor Scenes via Generative Imitation Learning

Offline Deep Model Predictive Control (MPC) for Visual Navigation

Deep Imitative Models for Flexible Inference, Planning, and Control

Model-Based Imitation Learning for Urban Driving

Model Predictive Inferential Control of Neural State-Space Models for Autonomous Vehicle Motion Planning

LVD-NMPC: A Learning-based Vision Dynamics Approach to Nonlinear Model Predictive Control for Autonomous Vehicles