Abstract:Generating reliable autonomous driving policies is an important goal in developing future transportation systems. Deep reinforcement learning has the potential to achieve this goal. However, policies generated by conventional deep reinforcement learning suffer from poor zero-shot generalization in the face of changes in environment conditions. Domain randomization may provide a solution, yet it brings a high degree of variance and unpredictability to the training process. We utilize multi-teacher policy distillation to circumvent this risk. However, conventional multi-teacher policy distillation paradigms present some issues. First, the teacher agent population is not diverse enough to provide the student agent with various knowledge. Second, the student agent only learns the output activations of teacher agents, without fully leveraging the guidance of teacher agents. Third, some teacher agents may dominate the training process, leading to the neglect of knowledge imparted by other teacher agents. To address these issues, we propose a three-stage multi-teacher policy distillation framework. The first stage is based on a k-determinantal point process. Training environments with dissimilar parameter settings are selected, diversifying the pre-trained teacher agents and providing the student agent with various knowledge. In the second stage, a gradient matching mechanism is applied to enable the student agent to benefit from gradients from multiple teacher agents. In the last stage, we propose a regulation mechanism to adaptively adjust the impact of each teacher agent on the student agent. This mechanism ensures balanced influence from each teacher agent. Experimental results show that our proposed framework effectively improves zero-shot generalization performance in environments with unseen conditions. Additionally, we analyze the influence of some key factors of our proposed framework.

Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals

Compositional Substitutivity of Visual Reasoning for Visual Question Answering

Generalization of Compositional Tasks with Logical Specification via Implicit Planning

Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Language

Zero-Shot Policy Transfer with Disentangled Task Representation of Meta-Reinforcement Learning.

Generalizable Policy Improvement Via Reinforcement Sampling (student Abstract)

MER: Modular Element Randomization for Robust Generalizable Policy in Deep Reinforcement Learning

Improving Compositional Generalization Using Iterated Learning and Simplicial Embeddings

Policy composition in reinforcement learning via multi-objective policy optimization

Compositional Generalization by Learning Analytical Expressions.

Learning Category-Level Generalizable Object Manipulation Policy Via Generative Adversarial Self-Imitation Learning from Demonstrations

Learning Invariable Semantical Representation from Language for Extensible Policy Generalization

A Multi-teacher Policy Distillation Framework for Enhancing Zero-Shot Generalization of Autonomous Driving Policies

Zero-Shot Compositional Policy Learning via Language Grounding

Synthesizing Programmatic Policy for Generalization Within Task Domain

Focus On What Matters: Separated Models For Visual-Based RL Generalization

Improving Policy Optimization with Generalist-Specialist Learning

A Study of Compositional Generalization in Neural Models

Generalizing to New Tasks via One-Shot Compositional Subgoals

Consistency Regularization Training for Compositional Generalization.

Robust Subtask Learning for Compositional Generalization