A Multi-teacher Policy Distillation Framework for Enhancing Zero-Shot Generalization of Autonomous Driving Policies
Jiachen Yang,Jipeng Zhang
DOI: https://doi.org/10.1109/tvt.2024.3379972
IF: 6.8
2024-01-01
IEEE Transactions on Vehicular Technology
Abstract:Generating reliable autonomous driving policies is an important goal in developing future transportation systems. Deep reinforcement learning has the potential to achieve this goal. However, policies generated by conventional deep reinforcement learning suffer from poor zero-shot generalization in the face of changes in environment conditions. Domain randomization may provide a solution, yet it brings a high degree of variance and unpredictability to the training process. We utilize multi-teacher policy distillation to circumvent this risk. However, conventional multi-teacher policy distillation paradigms present some issues. First, the teacher agent population is not diverse enough to provide the student agent with various knowledge. Second, the student agent only learns the output activations of teacher agents, without fully leveraging the guidance of teacher agents. Third, some teacher agents may dominate the training process, leading to the neglect of knowledge imparted by other teacher agents. To address these issues, we propose a three-stage multi-teacher policy distillation framework. The first stage is based on a k-determinantal point process. Training environments with dissimilar parameter settings are selected, diversifying the pre-trained teacher agents and providing the student agent with various knowledge. In the second stage, a gradient matching mechanism is applied to enable the student agent to benefit from gradients from multiple teacher agents. In the last stage, we propose a regulation mechanism to adaptively adjust the impact of each teacher agent on the student agent. This mechanism ensures balanced influence from each teacher agent. Experimental results show that our proposed framework effectively improves zero-shot generalization performance in environments with unseen conditions. Additionally, we analyze the influence of some key factors of our proposed framework.
telecommunications,engineering, electrical & electronic,transportation science & technology