Abstract:Generating reliable autonomous driving policies is an important goal in developing future transportation systems. Deep reinforcement learning has the potential to achieve this goal. However, policies generated by conventional deep reinforcement learning suffer from poor zero-shot generalization in the face of changes in environment conditions. Domain randomization may provide a solution, yet it brings a high degree of variance and unpredictability to the training process. We utilize multi-teacher policy distillation to circumvent this risk. However, conventional multi-teacher policy distillation paradigms present some issues. First, the teacher agent population is not diverse enough to provide the student agent with various knowledge. Second, the student agent only learns the output activations of teacher agents, without fully leveraging the guidance of teacher agents. Third, some teacher agents may dominate the training process, leading to the neglect of knowledge imparted by other teacher agents. To address these issues, we propose a three-stage multi-teacher policy distillation framework. The first stage is based on a k-determinantal point process. Training environments with dissimilar parameter settings are selected, diversifying the pre-trained teacher agents and providing the student agent with various knowledge. In the second stage, a gradient matching mechanism is applied to enable the student agent to benefit from gradients from multiple teacher agents. In the last stage, we propose a regulation mechanism to adaptively adjust the impact of each teacher agent on the student agent. This mechanism ensures balanced influence from each teacher agent. Experimental results show that our proposed framework effectively improves zero-shot generalization performance in environments with unseen conditions. Additionally, we analyze the influence of some key factors of our proposed framework.

Improving Policy Generalization for Teacher-Student Reinforcement Learning.

Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer

Learning to Teach Reinforcement Learning Agents

Improving Policy Optimization with Generalist-Specialist Learning

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

A Multi-teacher Policy Distillation Framework for Enhancing Zero-Shot Generalization of Autonomous Driving Policies

Improving interactive reinforcement learning: What makes a good teacher?

Policy Rehearsing: Training Generalizable Policies for Reinforcement Learning

Learning on a Budget via Teacher Imitation

TGRL: An Algorithm for Teacher Guided Reinforcement Learning

Generalizable Policy Improvement Via Reinforcement Sampling (student Abstract)

Online Policy Distillation with Decision-Attention

From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning

Guarded Policy Optimization with Imperfect Online Demonstrations

Policy composition in reinforcement learning via multi-objective policy optimization

Get a Head Start: On-Demand Pedagogical Policy Selection in Intelligent Tutoring

A Q-values Sharing Framework for Multiagent Reinforcement Learning under Budget Constraint

Agent-Aware Training for Agent-Agnostic Action Advising in Deep Reinforcement Learning

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals

Policy Optimization with Advantage Regularization for Long-Term Fairness in Decision Systems